爬虫-爬图片

时间：2020-03-29 16:53:41 阅读：70 评论：0 收藏：0 [点我收藏+]

在项目下新建image目录

技术分享图片
import requests
from pyquery import PyQuery as pq
# 可自动生成浏览器UserAgent请求头
from fake_useragent import UserAgent
# 模拟浏览器请求头
headers = {
　　# 请求类型
　　‘Accept‘: ‘text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3‘,
　　# 浏览器类型 (有的网址服务器检测浏览器反扒其中的一种) 可随机生成浏览器类型
　　‘User-Agent‘: UserAgent().random
}

# 抓取每个表格图片url

def index_data(page):
　　url = ‘https://www.169tp.com/gaogensiwa/list_3_{}.html‘.format(page)
　　# 获取首页数据
　　response = requests.get(url,headers=headers).content.decode(‘gbk‘)
　　# 初始化网页数据
　　doc = pq(response)
　　# 取需要层级的块 list <a>

技术分享图片
　　data = doc(‘.product01 li a‘).items()
　　# 遍历 a 获取href 链接
　　for i in data:
　　　　detail_url = i.attr(‘href‘)
　　　　detail_data(detail_url)

# 获取详情页url

def detail_data(urls):
　　response = requests.get(urls,headers=headers).content.decode(‘gbk‘)
　　doc = pq(response)
　　img_url = doc(‘.big_img p img‘).items()
　　for i in img_url:
　　　　image_url = i.attr(‘src‘)
　　download_img(image_url)

count = 0

# 保存图片
def download_img(image_url):
　　global count
　　response = requests.get(image_url, headers=headers).content
　　# 保存文件
　　with open(‘image/{}.jpg‘.format(count), ‘ab‘) as f: # a追加文件 b进制写入
　　　　f.write(response)
　　count += 1

# 提取前20页 /观察分页域名变化

技术分享图片
for i in range(1, 20):
　　index_data(i)

爬虫-爬图片

原文：https://www.cnblogs.com/webster1/p/12592765.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)