爬取百度今日热点排行

时间：2020-03-18 17:36:08 阅读：57 评论：0 收藏：0 [点我收藏+]

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = ‘http://top.baidu.com/buzz?b=11&c=513&fr=topbuzz_b342_c513‘
headers = {‘User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362‘}#伪装爬虫
r=requests.get(url)#发送get请求
r.encoding=r.apparent_encoding#统一编码
t=r.text
soup=BeautifulSoup(t,‘lxml‘)#提取html并解析内容
title=[]#创建两个列表
index=[]
for y in soup.find_all(class_="keyword"):#使用find all方法
    title.append(y.get_text().strip())
for x in soup.find_all(‘td‘, class_="last"):#查找以td标签的内容
    index.append(x.get_text().strip())
data=[title,index]
print(data)
s=pd.DataFrame(data,index=["标题","搜索指数"])#使用工具使其可视化
print(s.T)

打开我们所需要爬取的百度今日热点排行网站，点击查看源文件分析网页结构发现我们所需的class标签

技术分享图片

利用requests和beautifulsoup工具解析并得出结果

技术分享图片

爬取百度今日热点排行

原文：https://www.cnblogs.com/xmg6/p/12518179.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)