Urllib库基本使用

时间：2019-06-19 23:09:18 阅读：129 评论：0 收藏：0 [点我收藏+]

一、Urllib库详解

1、什么是Urllib

Python内置的HTTP请求库

urllib.request 　　　请求模块（模拟实现传入网址访问）

urllib.error 　　异常处理模块（如果出现错误，进行捕捉这个异常，然后进行重试和其他的操作保证程序不会意外的中止）

urllib.parse url解析模块（工具模块，提供了许多url处理方法，例如：拆分，合并等）

urllib.robotparser robots.txt解析模块（主要是用来识别网页的robots.txt文件，判断哪些网站是可以爬的，哪些是不可以爬的）

2、相比Python变化

Python2

import urllib2

response = urllib2.urlopen(‘http://www.baidu.com‘)

Python3

import urllib.request

response = urllib.request.urlopen(‘http://www.baidu.com‘)

3、基本用法

Urllib

urlopen

urllib.request.urlopen(url,data=None,[timeout,]*,cafile=None,capath=None,cadefault=False,context=None)

方法1

1 import urllib.request
2 
3 response = urllib.request.urlopen(‘http://www.baidu.com‘)
4 print(response.read().decode(‘utf-8‘))  # 获取相应体的内容，用decode(‘utf-8‘)显示

方法2

import urllib.request
import urllib.parse

data = bytes(urllib.parse.urlencode({‘word‘:‘hello‘}),encoding=‘utf-8‘)
response = urllib.request.urlopen(‘http://httpbin.org/post‘,data=data) # 加了data 是已post形式传递 ，不加则是get方式传递
print(response.read())

方法3

1 import urllib.request
2 
3 response = urllib.request.urlopen(‘http://httpbin.org/get‘,timeout=1)
4 print(response.read())

方法4

 1 import socket
 2 import urllib.request
 3 import urllib.error
 4 
 5 
 6 try:
 7     response = urllib.request.urlopen(‘http://httpbin.org/get‘,timeout=0.1)
 8 except urllib.error.URLError as e:
 9     if isinstance(e.reason,socket.timeout):
10         print(‘TIME OUT‘)

响应

响应类型

1 import urllib.request
2 
3 response = urllib.request.urlopen(‘http://www.baidu.com‘)
4 print(type(response))

状态码、响应头

1 import urllib.request
2 
3 response = urllib.request.urlopen(‘http://www.python.org‘)
4 print(response.status) # 获取状态码
5 print(response.getheaders())  # 获取响应头
6 print(response.getheader(‘Server‘)) # 获取特定的响应头，这里拿 Server举例

Request

url作为对象传给urlopen

1 import urllib.request
2 
3 request = urllib.request.Request(‘https://python.org‘) # 把url封装成一个对象
4 response = urllib.request.urlopen(request)  # 把对象传给urlopen一样可以访问
5 print(response.read().decode(‘utf-8‘))

添加request请求的方式

 1 from urllib import request,parse
 2 
 3 url = ‘http://httpbin.org/post‘
 4 headers={
 5     ‘User-Agent‘:‘Mozilla/4.0(compatible;MSIE 5.5;Windows NT)‘,
 6     ‘Host‘:‘httpbin.org‘
 7 }
 8 dict = {
 9     ‘name‘:‘Germey‘
10 }
11 data = bytes(parse.urlencode(dict),encoding=‘utf-8‘)
12 req = request.Request(url=url,data=data,headers=headers,method=‘POST‘)
13 response = request.urlopen(req)
14 print(response.read().decode(‘utf-8‘))

request.add_header()方法

 1 from urllib import request,parse
 2 
 3 url = ‘http://httpbin.org/post‘
 4 dict = {
 5     ‘name‘:‘Germey‘
 6 }
 7 data = bytes(parse.urlencode(dict),encoding=‘utf-8‘)
 8 req = request.Request(url=url,data=data,method=‘POST‘)
 9 req.add_header(‘User-Agent‘,‘Mozilla/4.0(compatible;MSIE 5.5;Windows NT)‘)
10 response = request.urlopen(req)
11 print(response.read().decode(‘utf-8‘))

Handler

代理

Urllib库基本使用

原文：https://www.cnblogs.com/wyh-study/p/11055140.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)