网络爬虫BeautifulSoup库的使用

时间：2019-04-27 12:57:26 阅读：130 评论：0 收藏：0 [点我收藏+]

使用BeautifulSoup库提取HTML页面信息

技术分享图片

#!/usr/bin/python3
import requests
from bs4 import BeautifulSoup

url=‘http://python123.io/ws/demo.html‘
r=requests.get(url)
if r.status_code==200:
    print(‘网络请求成功‘)

demo=r.text
soup=BeautifulSoup(demo,‘html.parser‘)
print(soup.prettify())

BeautifulSoup类的基本属性

技术分享图片

#!/usr/bin/python3
import requests
from bs4 import BeautifulSoup

url=‘http://python123.io/ws/demo.html‘
r=requests.get(url)
if r.status_code==200:
    print(‘网络请求成功‘)

demo=r.text
soup=BeautifulSoup(demo,‘html.parser‘)
tag_title=soup.title
print(tag_title)
tag_a_attrs=soup.a.attrs
print(soup.p.string)

标签树的下行遍历

#!/usr/bin/python3
import requests
from bs4 import BeautifulSoup

url=‘http://python123.io/ws/demo.html‘
r=requests.get(url)
if r.status_code==200:
    print(‘网络请求成功‘)

demo=r.text
soup=BeautifulSoup(demo,‘html.parser‘)

print(soup.prettify())
print(‘我是分割线‘.center(80,‘-‘))
#遍历子节点

for child in soup.body.children:
    print(child)
#遍历子孙节点
for descendant in soup.body.descendants:
    print(descendant)

标签树的上行遍历

技术分享图片

遍历title标签的上一级标签

print(soup.title.parent)

#a标签的下一标签
print(soup.a.next_sibling)

遍历a标签的所有前序节点以及后续节点

#遍历a标签的前序节点
for sibling in soup.a.next_siblings:
    print(sibling)
#遍历a标签的前序节点
for sibling in soup.a.previous_siblings:
    print(sibling)

soup标签的上一级标签为空，所以要进行判断

技术分享图片

网络爬虫BeautifulSoup库的使用

原文：https://www.cnblogs.com/liberate20/p/10778322.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)