python requests 简单网页文本爬取

时间：2018-06-20 20:08:59 阅读：233 评论：0 收藏：0 [点我收藏+]

爬取网页：

http://www.cnblogs.com/xrq730/archive/2018/06/11/9159586.html

抓取的是一个博客的文本内容

用requeusts获取整个网页的HTML信息；
使用Beautiful Soup解析HTML信息

技术分享图片

 1 import requests
 2 from bs4 import BeautifulSoup
 3  
 4 
 5 if __name__==‘__main__‘:
 6     target=‘http://www.cnblogs.com/xrq730/archive/2018/06/11/9159586.html‘
 7     req=requests.get(url=target)
 8     html=req.text
 9     bf=BeautifulSoup(html)
10     texts=bf.find_all(‘div‘,class_=‘blogpost-body‘)
11     #print(html)
12     print(texts[0].text.replace(‘<p><span style=\"font-size: 14px; font-family: 宋体;\">‘,‘\n\n\t‘))
13     #print(texts[0].text.replace(‘\ax0‘*8,‘\n\n‘))

python requests 简单网页文本爬取

原文：https://www.cnblogs.com/xy-ju24/p/9204416.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)