爬虫入门之response、xpath

时间：2020-03-12 22:13:44 阅读：223 评论：0 收藏：0 [点我收藏+]

Response

r.status_code #http请求的返回状态，200链接成功
r.text #返回对象的文本内容
r.content #猜测返回对象的二进制形式
r.encoding #分析返回对象的编码方式
r.apparent_encoding #响应内容编码方式

xpath

https://zhuanlan.zhihu.com/p/25572729学习网址

自动生成路径

f12+选中要爬的内容部分+右键copy-->copy xpath

简单爬虫模板

import requests
from lxml import etree


def getHtmlText(url,header):
    files={}
    r=requests.get(url=url,headers=header)
    s=etree.HTML(r.text)
    for i in  range(10):
    #xpath的自动生成路径
        files=s.xpath('//*[@id="comments"]/ul[1]/li['+str(i+1)+']/div[2]/p/span/text()')
    return files

def saveText(files):
    with open("discuss.text","w",encoding="utf-8") as f:
        for i in files:
            f.write(i)

if __name__ == '__main__':
    url="https://book.douban.com/subject/34876107/comments/"
    header={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"}
    print(getHtmlText(url,header))
    files=getHtmlText(url,header)
    saveText(files)

爬虫入门之response、xpath

原文：https://www.cnblogs.com/marier/p/12483046.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)