首页 > 其他 > 详细

爬虫-day02-抓取和分析

时间:2018-05-09 14:02:45      阅读:153      评论:0      收藏:0      [点我收藏+]
###页面抓取###
1、urllib3
    是一个功能强大且好用的HTTP客户端,弥补了Python标准库中的不足
    安装: pip install urllib3
    使用:
import urllib3
http = urllib3.PoolManager()
response = http.request(GET, http://news.qq.com)
print(response.headers)
result = response.data.decode(gbk)
print(result)
 
发送HTTPS协议的请求
安装依赖 : pip install certifi
import  certifi
import urllib3
http = urllib3.PoolManager(cert_reqs = CERT_REQUIRED, ca_certs = certifi.where()) #添加证书
resp = http.request(GET, http://news.baidu.com/)
print(resp.data.decode(utf-8))
 
####带上参数
import urllib3
from urllib.parse import urlencode
http = urllib3.PoolManager()
args = {wd : 人民币}
# url = ‘http://www.baidu.com/s?%s‘ % (args)
url = http://www.baidu.com/s?%s % (urlencode(args))
print(url)
# resp = http.request(‘GET‘ , url)
# print(resp.data.decode(‘utf-8‘))
 
headers = {
    Accept : text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, **; q=0.01,
    Accept-Encoding : gzip, deflate, br,
    Accept-Language : zh-CN,zh;q=0.9,
    Connection : keep-alive,
    Host : www.baidu.com,
    Referer : https://www.baidu.com/s?wd=人民币,
    User-Agent : "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"
}
resp8 = requests.get(url8, fields=args8, headers=headers8)
print(resp8.text)

 

 
 
 
 

爬虫-day02-抓取和分析

原文:https://www.cnblogs.com/Albert-w/p/9013194.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!