首页 > 编程语言 > 详细

python小白学习记录 电影天堂多页爬取实例

时间:2020-02-10 20:33:09      阅读:89      评论:0      收藏:0      [点我收藏+]
from lxml import etree
import requests
baseurl0 = "https://www.ygdy8.net"
headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36"
    }
def get_page():
    for x in range(1,4):
        pageurl = "https://www.ygdy8.net/html/gndy/dyzz/list_23_{}.html"
        pageurl = pageurl.format(x)
        get_urls(pageurl)
def get_urls(baseurl):
    resp = requests.get(baseurl, headers=headers)
    result = resp.text
    html = etree.HTML(result)
    uls = html.xpath("//table[@class=‘tbspan‘]//a[@href]/@href")
    uls = map(lambda url:baseurl0+url,uls)
    for ul in uls:
        print(ul)
        get_detalis_urls(ul)

def get_detalis_urls(url):
    resp = requests.get(url, headers=headers)
    result = resp.content.decode(gbk)
    html = etree.HTML(result)
    uls = html.xpath("//div/h1/font[@color]/text()")
    print(uls)
    uls2 = html.xpath("//img[@src]/@src")[0]
    print(uls2)
    print("---------------------------------------")

get_page()

 

python小白学习记录 电影天堂多页爬取实例

原文:https://www.cnblogs.com/jswf/p/12292232.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!