python爬虫笔记（七）网络爬虫之框架（3）——Scrapy爬虫框架（实例2：股票定向爬虫）

时间：2020-02-01 19:37:00 阅读：121 评论：0 收藏：0 [点我收藏+]

1. 股票定向爬虫

技术分享图片

2. 实例编写

技术分享图片

2.1 建立工程和spider模板

技术分享图片

（2）配置stocks.py文件

# -*- coding: utf-8 -*-
import scrapy
import re

class StocksSpider(scrapy.Spider):
    name = ‘stocks‘
    start_urls = [‘http://quote.eastmoney.com/stock_list.html‘]


    def parse(self, response):
        for href in response.css(‘a::attr(href)‘).extract():
            try:
                stock = re.findall(r‘[s][hz]\d{6}‘, href)[0]
                url = ‘https://gupiao.baidu.com/stock/‘ + stock + ‘.html‘
                print("debug:", url)
                yield scrapy.Request(url, callback=self.parse_stock)
            except:
                continue
        
    def parse_stock(self, response):
        print("解析股票......................................")
        infoDict = {}
        
        # 获取股票名字
        stockInfo = response.css(‘.stock-bets‘)
        name = stockInfo.css(‘.bets-name‘).extract()[0]
       
        keyList = stockInfo.css(‘dt‘).extract()
        valueList = stockInfo.css(‘dd‘).extract()
        
        for i in range(len(keyList)):
            key = re.findall(r‘>.*</dt>‘, keyList[i])[0][1:-5]
            try:
                value = re.findall(r‘\d+\.?.*</dd>‘, valueList[i])[0][1:-5]
            except:
                value = ‘--‘
                
        infoDict[key] = value
        
        infoDict.update(
                {‘股票名称‘ : re.findall(‘\s.*\(‘, name)[0].split()[0] +                   re.findall(‘\>.*\<‘, name)[0][1:-1]})
        yield infoDict

技术分享图片

（3）对爬取项，进一步处理（配置piplines.py文件）

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don‘t forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html


class BaidustocksPipeline(object):
    def process_item(self, item, spider):
        return item
    
class BaidustocksInfoPipeline(object):
    def open_spider(self, spider):
        self.f = open(‘BaiduStockInfo.txt‘, ‘w‘)
        
    def close_spider(self, spider):
        self.f.close()
        
    def process_item(self, item, spider):
        try:
            line = str(dict(item)) + ‘\n‘
            self.f.write(line)
        except:
            pass
        # 如果希望其他函数也处理这个item
        return item

（4）配置 ITEM_PIPELINES（配置settings.py文件）

技术分享图片

3. 实例优化

技术分享图片

python爬虫笔记（七）网络爬虫之框架（3）——Scrapy爬虫框架（实例2：股票定向爬虫）

原文：https://www.cnblogs.com/douzujun/p/12249226.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)