通过request获取网页资讯通过BeautifulSoup剖析网页元素

时间：2017-08-21 15:27:16 阅读：386 评论：0 收藏：0 [点我收藏+]

import requests

newsUrl =‘http://news.sina.com.cn/china/‘

res = requests.get(newsUrl)

res.encoding =‘utf-8’

pint

print(res.text)

//然后通过DOM Tree来剖析网页元素

from bs4 import BeautifulSoup

html_sample =‘\

<html>\

<body>\

<h1 id="title">this is h1</h1>\

<a class="link" href="fdfdfdfd">this is a link</a>\

<a class="link" href="fdfdfdfd">this is another link</a>\

</body>\

</html>‘

‘‘‘

html.parser 解析器 ,不写的话会发出警告

‘‘‘

soup = BeautifulSoup(html_sample,‘html.parser’)

print(soup.text)

#找出所有含特定标签的HTML元素

#1: 使用select 找出含有h1标签的元素

header = soup.select(‘h1’)

print(header)print(header[0].text )

#第0个标签中的文字

#2: 使用select找出含有a标签的元素

alink = soup.select(‘a’)

print(alink)

for link in alink:

#print(link)

print(link.text)

#取得含有特定CSS属性的元素

#1使用select找出所有id为title的元素(id前需加#)

aTitle = soup.select(‘#title‘)

print(aTitle)

#2使用select找出所有class为link的元素(class前需要加.)

for mylink in soup.select(‘.link‘):

print(mylink)

#取得所有a标签内的链接

#使用select找出所有a tag的href连结

ahref = soup.select(‘a‘)

for ah in ahref:

print(ah[‘href‘])

通过request获取网页资讯通过BeautifulSoup剖析网页元素

原文：http://www.cnblogs.com/tian-sun/p/7404394.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)

通过request获取网页资讯 通过BeautifulSoup剖析网页元素

通过request获取网页资讯通过BeautifulSoup剖析网页元素