python小白学习记录电影天堂多页爬取实例

时间：2020-02-10 20:33:09 阅读：94 评论：0 收藏：0 [点我收藏+]

from lxml import etree
import requests
baseurl0 = "https://www.ygdy8.net"
headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36"
    }
def get_page():
    for x in range(1,4):
        pageurl = "https://www.ygdy8.net/html/gndy/dyzz/list_23_{}.html"
        pageurl = pageurl.format(x)
        get_urls(pageurl)
def get_urls(baseurl):
    resp = requests.get(baseurl, headers=headers)
    result = resp.text
    html = etree.HTML(result)
    uls = html.xpath("//table[@class=‘tbspan‘]//a[@href]/@href")
    uls = map(lambda url:baseurl0+url,uls)
    for ul in uls:
        print(ul)
        get_detalis_urls(ul)

def get_detalis_urls(url):
    resp = requests.get(url, headers=headers)
    result = resp.content.decode(‘gbk‘)
    html = etree.HTML(result)
    uls = html.xpath("//div/h1/font[@color]/text()")
    print(uls)
    uls2 = html.xpath("//img[@src]/@src")[0]
    print(uls2)
    print("---------------------------------------")

get_page()

原文：https://www.cnblogs.com/jswf/p/12292232.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)

python小白学习记录 电影天堂多页爬取实例

python小白学习记录电影天堂多页爬取实例