首页 > 其他 > 详细

Scrapy框架: 通用爬虫之CSVFeedSpider

时间:2019-11-16 18:22:06      阅读:113      评论:0      收藏:0      [点我收藏+]

步骤01: 创建项目

scrapy startproject csvfeedspider

步骤02: 使用csvfeed模版

scrapy genspider -t csvfeed csvdata gzdata.gov.cn

步骤03: 编写items.py

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class CsvspiderItem(scrapy.Item):
    # define the fields for your item here like:
    # 姓名
    name = scrapy.Field()
    # 研究领域
    SearchField = scrapy.Field()
    # 服务分类
    Service = scrapy.Field()
    # 专业特长
    Specialty = scrapy.Field()

步骤04: 编写爬虫文件csvdata.py

# -*- coding: utf-8 -*-
from scrapy.spiders import CSVFeedSpider
from csvfeedspider.items import CsvspiderItem


class CsvparseSpider(CSVFeedSpider):
    name = 'csvdata'
    allowed_domains = ['gzdata.gov.cn']
    start_urls = ['http://gzopen.oss-cn-guizhou-a.aliyuncs.com/科技特派员.csv']
    headers = ['name', 'SearchField', 'Service', 'Specialty']
    delimiter = ','
    quotechar = "\n"

    # Do any adaptations you need here
    def adapt_response(self, response):
       return response.body.decode('gb18030')

    def parse_row(self, response, row):

        i = CsvspiderItem()
        try:
            i['name'] = row['name']
            i['SearchField'] = row['SearchField']
            i['Service'] = row['Service']
            i['Specialty'] = row['Specialty']

        except:
            pass
        yield i

步骤05: 运行爬虫文件

scrapy crawl csvdata

Scrapy框架: 通用爬虫之CSVFeedSpider

原文:https://www.cnblogs.com/hankleo/p/11872613.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!