首页 > 其他 > 详细

BeautifulSoup

时间:2020-03-15 19:43:34      阅读:59      评论:0      收藏:0      [点我收藏+]
import requests
from bs4 import BeautifulSoup

def getHTMLText(url):
    try:
        kv = {'user-agent':'Mozilla/5.0'}
        r = requests.get(url, timeout=30, headers=kv)
        r.raise_for_status()    # 如果状态不是200,引发HTTPError异常
        r.encoding = r.apparent_encoding
        print(r.request.headers)
        print('---------------')
        return r.text[:1000]
    except:
        return '产生异常'


if __name__ == '__main__':
    url = 'http://www.baidu.com'
    demo = getHTMLText(url)
    
    soup = BeautifulSoup(demo, 'html.parser')
    print(soup.prettify())

    print(soup.title)
    print(soup.a.name)
    print(soup.a.parent.name)
    print(soup.a.attrs) # 属性

BeautifulSoup

原文:https://www.cnblogs.com/holaworld/p/12499232.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!