首页 > 编程语言 > 详细

python爬虫笔记(七)网络爬虫之框架(3)——Scrapy爬虫框架(实例2:股票定向爬虫)

时间:2020-02-01 19:37:00      阅读:121      评论:0      收藏:0      [点我收藏+]

1. 股票定向爬虫

技术分享图片

2. 实例编写

技术分享图片

2.1 建立工程和spider模板

技术分享图片

技术分享图片

(2)配置stocks.py文件

# -*- coding: utf-8 -*-
import scrapy
import re

class StocksSpider(scrapy.Spider):
    name = stocks
    start_urls = [http://quote.eastmoney.com/stock_list.html]


    def parse(self, response):
        for href in response.css(a::attr(href)).extract():
            try:
                stock = re.findall(r[s][hz]\d{6}, href)[0]
                url = https://gupiao.baidu.com/stock/ + stock + .html
                print("debug:", url)
                yield scrapy.Request(url, callback=self.parse_stock)
            except:
                continue
        
    def parse_stock(self, response):
        print("解析股票......................................")
        infoDict = {}
        
        # 获取股票名字
        stockInfo = response.css(.stock-bets)
        name = stockInfo.css(.bets-name).extract()[0]
       
        keyList = stockInfo.css(dt).extract()
        valueList = stockInfo.css(dd).extract()
        
        for i in range(len(keyList)):
            key = re.findall(r>.*</dt>, keyList[i])[0][1:-5]
            try:
                value = re.findall(r\d+\.?.*</dd>, valueList[i])[0][1:-5]
            except:
                value = --
                
        infoDict[key] = value
        
        infoDict.update(
                {股票名称 : re.findall(\s.*\(, name)[0].split()[0] +                   re.findall(\>.*\<, name)[0][1:-1]})
        yield infoDict
                

技术分享图片

(3)对爬取项,进一步处理(配置piplines.py文件)

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don‘t forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html


class BaidustocksPipeline(object):
    def process_item(self, item, spider):
        return item
    
class BaidustocksInfoPipeline(object):
    def open_spider(self, spider):
        self.f = open(BaiduStockInfo.txt, w)
        
    def close_spider(self, spider):
        self.f.close()
        
    def process_item(self, item, spider):
        try:
            line = str(dict(item)) + \n
            self.f.write(line)
        except:
            pass
        # 如果希望其他函数也处理这个item
        return item
    

(4)配置 ITEM_PIPELINES(配置settings.py文件)

技术分享图片

 

3. 实例优化

技术分享图片

 

 

 

 

 

 

 

 

python爬虫笔记(七)网络爬虫之框架(3)——Scrapy爬虫框架(实例2:股票定向爬虫)

原文:https://www.cnblogs.com/douzujun/p/12249226.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!