首页 > 其他 > 详细

quotes 整站数据爬取存mongo

时间:2019-04-21 17:40:28      阅读:141      评论:0      收藏:0      [点我收藏+]

安装完成scrapy后爬取部分信息已经不能满足躁动的心了,那么试试http://quotes.toscrape.com/整站数据爬取

第一部分 项目创建

1、进入到存储项目的文件夹,执行指令 scrapy startproject quotetutorial ,新建一个项目quotetutorial。

2. cd quotetutorial  

3、 scrapy genspider quotes quotes.toscrape.com  创建quotes.py模板文件

 

第二部分 配置模板

1、到settings.py文件内进行配置

技术分享图片
 1 # -*- coding: utf-8 -*-
 2 
 3 # Scrapy settings for quotetutorial project
 4 #
 5 # For simplicity, this file contains only settings considered important or
 6 # commonly used. You can find more settings consulting the documentation:
 7 #
 8 #     https://doc.scrapy.org/en/latest/topics/settings.html
 9 #     https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
10 #     https://doc.scrapy.org/en/latest/topics/spider-middleware.html
11 
12 BOT_NAME = quotetutorial
13 
14 SPIDER_MODULES = [quotetutorial.spiders]
15 NEWSPIDER_MODULE = quotetutorial.spiders
16 
17 MONGO_URL =localhost
18 MONGO_DB =quotestutorial
19 
20 # Crawl responsibly by identifying yourself (and your website) on the user-agent
21 USER_AGENT = Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
22 
23 # Obey robots.txt rules
24 ROBOTSTXT_OBEY = False
25 
26 # Configure maximum concurrent requests performed by Scrapy (default: 16)
27 #CONCURRENT_REQUESTS = 32
28 
29 # Configure a delay for requests for the same website (default: 0)
30 # See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
31 # See also autothrottle settings and docs
32 #DOWNLOAD_DELAY = 3
33 # The download delay setting will honor only one of:
34 #CONCURRENT_REQUESTS_PER_DOMAIN = 16
35 #CONCURRENT_REQUESTS_PER_IP = 16
36 
37 # Disable cookies (enabled by default)
38 #COOKIES_ENABLED = False
39 
40 # Disable Telnet Console (enabled by default)
41 #TELNETCONSOLE_ENABLED = False
42 
43 # Override the default request headers:
44 #DEFAULT_REQUEST_HEADERS = {
45 #   Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8,
46 #   Accept-Language: en,
47 #}
48 
49 # Enable or disable spider middlewares
50 # See https://doc.scrapy.org/en/latest/topics/spider-middleware.html
51 #SPIDER_MIDDLEWARES = {
52 #    quotetutorial.middlewares.QuotetutorialSpiderMiddleware: 543,
53 #}
54 
55 # Enable or disable downloader middlewares
56 # See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
57 #DOWNLOADER_MIDDLEWARES = {
58 #    quotetutorial.middlewares.QuotetutorialDownloaderMiddleware: 543,
59 #}
60 
61 # Enable or disable extensions
62 # See https://doc.scrapy.org/en/latest/topics/extensions.html
63 #EXTENSIONS = {
64 #    scrapy.extensions.telnet.TelnetConsole: None,
65 #}
66 
67 # Configure item pipelines
68 # See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
69 ITEM_PIPELINES = {
70    quotetutorial.pipelines.TextPipeline: 300,
71    quotetutorial.pipelines.MongoPipeline: 400,
72 }
73 
74 # Enable and configure the AutoThrottle extension (disabled by default)
75 # See https://doc.scrapy.org/en/latest/topics/autothrottle.html
76 #AUTOTHROTTLE_ENABLED = True
77 # The initial download delay
78 #AUTOTHROTTLE_START_DELAY = 5
79 # The maximum download delay to be set in case of high latencies
80 #AUTOTHROTTLE_MAX_DELAY = 60
81 # The average number of requests Scrapy should be sending in parallel to
82 # each remote server
83 #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
84 # Enable showing throttling stats for every response received:
85 #AUTOTHROTTLE_DEBUG = False
86 
87 # Enable and configure HTTP caching (disabled by default)
88 # See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
89 #HTTPCACHE_ENABLED = True
90 #HTTPCACHE_EXPIRATION_SECS = 0
91 #HTTPCACHE_DIR = httpcache
92 #HTTPCACHE_IGNORE_HTTP_CODES = []
93 #HTTPCACHE_STORAGE = scrapy.extensions.httpcache.FilesystemCacheStorage
View Code

2、配置items文件

 1 # -*- coding: utf-8 -*-
 2 
 3 # Define here the models for your scraped items
 4 #
 5 # See documentation in:
 6 # https://doc.scrapy.org/en/latest/topics/items.html
 7 
 8 import scrapy
 9 
10 
11 class QuoteItem(scrapy.Item):
12     # define the fields for your item here like:
13     # name = scrapy.Field()
14     text = scrapy.Field()
15     author = scrapy.Field()
16     tags = scrapy.Field()
17 

 

3、进入spider文件夹编辑quotes.py文件

技术分享图片
 1 # -*- coding: utf-8 -*-
 2 import scrapy
 3 
 4 from quotetutorial.items import QuoteItem
 5 
 6 
 7 class QuotesSpider(scrapy.Spider):
 8     name = quotes
 9     allowed_domains = [quotes.toscrape.com]
10     start_urls = [http://quotes.toscrape.com/]
11 
12     def parse(self,response):
13         # print(response.text)
14         # pass
15         quotes = response.css(.quote)
16         for quote in quotes:
17             item = QuoteItem()
18             text = quote.css(.text::text).extract_first()
19             author = quote.css(.author::text).extract_first()
20             tags = quote.css(.tags .tag::text).extract()
21             item[text] = text
22             item[author] = author
23             item[tags] = tags
24             yield item
25 
26         # css定位下一页href 进行url拼接 callback回调自己,实现循环爬取页面
27         next_page = response.css(.pager .next a::attr(href)).extract_first()
28         url = response.urljoin(next_page)
29         yield scrapy.Request(url=url, callback=self.parse)
View Code

 

4、编辑pipelines文件

技术分享图片
 1 # -*- coding: utf-8 -*-
 2 
 3 # Define your item pipelines here
 4 #
 5 # Dont forget to add your pipeline to the ITEM_PIPELINES setting
 6 # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
 7 import pymongo
 8 
 9 from scrapy.exceptions import DropItem
10 
11 
12 class TextPipeline(object):
13     def __init__(self):
14         self.limit = 50
15     def process_item(self, item, spider):
16         if item[text]:
17             if len(item[text]) > self.limit:
18                 item[text] = item[text][0:self.limit].rstrip()+"..."
19             return item
20         else:
21             return DropItem(Missing text)
22 
23 class MongoPipeline(object):
24     def __init__(self,mongo_url,mongo_db):
25         self.mongo_url = mongo_url
26         self.mongo_db = mongo_db
27     @classmethod
28     def from_crawler(cls,crawler):
29         return cls(
30             mongo_url = crawler.settings.get(MONGO_URL),
31             mongo_db=crawler.settings.get(MONGO_DB)
32 
33         )
34     def open_spider(self,spider):
35         self.client = pymongo.MongoClient(self.mongo_url)
36         self.db = self.client[self.mongo_db]
37 
38     def process_item(self,item,spider):
39         name = item.__class__.__name__
40         self.db[name].insert(dict(item))
41         return item
42 
43     def close_spider(self,spider):
44         self.client.close()
View Code

 

5、确保本地mongoDB已安装,并且运行中,

6、执行爬虫代码 scrapy crawl quotes 

7、查看爬虫程序日志信息,查看mogo数据库 已成功自动创建quotetutorial 数据库

技术分享图片
  1 D:\study\quotetutorial>scrapy crawl quotes -o quotes.marshal
  2 2019-04-21 16:33:11 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: quotetutorial)
  3 2019-04-21 16:33:11 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018
  4 , 23:09:28) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b  26 Feb 2019), cryptography 2.6.1, Platform Windows-10-10.0.17134-SP0
  5 2019-04-21 16:33:11 [scrapy.crawler] INFO: Overridden settings: {BOT_NAME: quotetutorial, FEED_FORMAT: marshal, FEED_URI: quotes.marshal, NEWSPIDER_MODULE: quotetutorial.spiders,
  6  SPIDER_MODULES: [quotetutorial.spiders], USER_AGENT: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36}
  7 2019-04-21 16:33:11 [scrapy.extensions.telnet] INFO: Telnet Password: 65736cacf7cdc93a
  8 2019-04-21 16:33:11 [scrapy.middleware] INFO: Enabled extensions:
  9 [scrapy.extensions.corestats.CoreStats,
 10  scrapy.extensions.telnet.TelnetConsole,
 11  scrapy.extensions.feedexport.FeedExporter,
 12  scrapy.extensions.logstats.LogStats]
 13 2019-04-21 16:33:12 [scrapy.middleware] INFO: Enabled downloader middlewares:
 14 [scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware,
 15  scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware,
 16  scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware,
 17  scrapy.downloadermiddlewares.useragent.UserAgentMiddleware,
 18  scrapy.downloadermiddlewares.retry.RetryMiddleware,
 19  scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware,
 20  scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware,
 21  scrapy.downloadermiddlewares.redirect.RedirectMiddleware,
 22  scrapy.downloadermiddlewares.cookies.CookiesMiddleware,
 23  scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware,
 24  scrapy.downloadermiddlewares.stats.DownloaderStats]
 25 2019-04-21 16:33:12 [scrapy.middleware] INFO: Enabled spider middlewares:
 26 [scrapy.spidermiddlewares.httperror.HttpErrorMiddleware,
 27  scrapy.spidermiddlewares.offsite.OffsiteMiddleware,
 28  scrapy.spidermiddlewares.referer.RefererMiddleware,
 29  scrapy.spidermiddlewares.urllength.UrlLengthMiddleware,
 30  scrapy.spidermiddlewares.depth.DepthMiddleware]
 31 2019-04-21 16:33:12 [scrapy.middleware] INFO: Enabled item pipelines:
 32 [quotetutorial.pipelines.TextPipeline,
 33  quotetutorial.pipelines.MongoPipeline]
 34 2019-04-21 16:33:12 [scrapy.core.engine] INFO: Spider opened
 35 2019-04-21 16:33:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
 36 2019-04-21 16:33:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
 37 2019-04-21 16:33:16 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/> (referer: None)
 38 2019-04-21 16:33:16 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
 39 {author: Albert Einstein,
 40  tags: [change, deep-thoughts, thinking, world],
 41  text: “The world as we have created it is a process of o...}
 42 2019-04-21 16:33:16 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
 43 {author: J.K. Rowling,
 44  tags: [abilities, choices],
 45  text: “It is our choices, Harry, that show what we truly...}
 46 2019-04-21 16:33:16 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
 47 {author: Albert Einstein,
 48  tags: [inspirational, life, live, miracle, miracles],
 49  text: “There are only two ways to live your life. One is...}
 50 2019-04-21 16:33:16 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
 51 {author: Jane Austen,
 52  tags: [aliteracy, books, classic, humor],
 53  text: “The person, be it gentleman or lady, who has not...}
 54 2019-04-21 16:33:16 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
 55 {author: Marilyn Monroe,
 56  tags: [be-yourself, inspirational],
 57  text: "“Imperfection is beauty, madness is genius and it‘..."}
 58 2019-04-21 16:33:16 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
 59 {author: Albert Einstein,
 60  tags: [adulthood, success, value],
 61  text: “Try not to become a man of success. Rather become...}
 62 2019-04-21 16:33:16 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
 63 {author: André Gide,
 64  tags: [life, love],
 65  text: “It is better to be hated for what you are than to...}
 66 2019-04-21 16:33:16 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
 67 {author: Thomas A. Edison,
 68  tags: [edison, failure, inspirational, paraphrased],
 69  text: "“I have not failed. I‘ve just found 10,000 ways th..."}
 70 2019-04-21 16:33:16 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
 71 {author: Eleanor Roosevelt,
 72  tags: [misattributed-eleanor-roosevelt],
 73  text: “A woman is like a tea bag; you never know how str...}
 74 2019-04-21 16:33:16 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
 75 {author: Steve Martin,
 76  tags: [humor, obvious, simile],
 77  text: “A day without sunshine is like, you know, night.”}
 78 2019-04-21 16:33:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: http://quotes.toscrape.com/)
 79 2019-04-21 16:33:18 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
 80 {author: Marilyn Monroe,
 81  tags: [friends, heartbreak, inspirational, life, love, sisters],
 82  text: “This life is what you make it. No matter what, yo...}
 83 2019-04-21 16:33:18 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
 84 {author: J.K. Rowling,
 85  tags: [courage, friends],
 86  text: “It takes a great deal of bravery to stand up to o...}
 87 2019-04-21 16:33:18 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
 88 {author: Albert Einstein,
 89  tags: [simplicity, understand],
 90  text: "“If you can‘t explain it to a six year old, you do..."}
 91 2019-04-21 16:33:18 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
 92 {author: Bob Marley,
 93  tags: [love],
 94  text: “You may not be her first, her last, or her only....}
 95 2019-04-21 16:33:18 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
 96 {author: Dr. Seuss,
 97  tags: [fantasy],
 98  text: “I like nonsense, it wakes up the brain cells. Fan...}
 99 2019-04-21 16:33:18 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
100 {author: Douglas Adams,
101  tags: [life, navigation],
102  text: “I may not have gone where I intended to go, but I...}
103 2019-04-21 16:33:18 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
104 {author: Elie Wiesel,
105  tags: [activism,
106           apathy,
107           hate,
108           indifference,
109           inspirational,
110           love,
111           opposite,
112           philosophy],
113  text: "“The opposite of love is not hate, it‘s indifferen..."}
114 2019-04-21 16:33:18 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
115 {author: Friedrich Nietzsche,
116  tags: [friendship,
117           lack-of-friendship,
118           lack-of-love,
119           love,
120           marriage,
121           unhappy-marriage],
122  text: “It is not a lack of love, but a lack of friendshi...}
123 2019-04-21 16:33:18 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
124 {author: Mark Twain,
125  tags: [books, contentment, friends, friendship, life],
126  text: “Good friends, good books, and a sleepy conscience...}
127 2019-04-21 16:33:18 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
128 {author: Allen Saunders,
129  tags: [fate, life, misattributed-john-lennon, planning, plans],
130  text: “Life is what happens to us while we are making ot...}
131 2019-04-21 16:33:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/3/> (referer: http://quotes.toscrape.com/page/2/)
132 2019-04-21 16:33:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/3/>
133 {author: Pablo Neruda,
134  tags: [love, poetry],
135  text: “I love you without knowing how, or when, or from...}
136 2019-04-21 16:33:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/3/>
137 {author: Ralph Waldo Emerson,
138  tags: [happiness],
139  text: “For every minute you are angry you lose sixty sec...}
140 2019-04-21 16:33:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/3/>
141 {author: Mother Teresa,
142  tags: [attributed-no-source],
143  text: “If you judge people, you have no time to love the...}
144 2019-04-21 16:33:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/3/>
145 {author: Garrison Keillor,
146  tags: [humor, religion],
147  text: “Anyone who thinks sitting in church can make you...}
148 2019-04-21 16:33:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/3/>
149 {author: Jim Henson,
150  tags: [humor],
151  text: “Beauty is in the eye of the beholder and it may b...}
152 2019-04-21 16:33:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/3/>
153 {author: Dr. Seuss,
154  tags: [comedy, life, yourself],
155  text: “Today you are You, that is truer than true. There...}
156 2019-04-21 16:33:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/3/>
157 {author: Albert Einstein,
158  tags: [children, fairy-tales],
159  text: “If you want your children to be intelligent, read...}
160 2019-04-21 16:33:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/3/>
161 {author: J.K. Rowling,
162  tags: [],
163  text: “It is impossible to live without failing at somet...}
164 2019-04-21 16:33:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/3/>
165 {author: Albert Einstein,
166  tags: [imagination],
167  text: “Logic will get you from A to Z; imagination will...}
168 2019-04-21 16:33:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/3/>
169 {author: Bob Marley,
170  tags: [music],
171  text: “One good thing about music, when it hits you, you...}
172 2019-04-21 16:33:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/4/> (referer: http://quotes.toscrape.com/page/3/)
173 2019-04-21 16:33:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/4/>
174 {author: Dr. Seuss,
175  tags: [learning, reading, seuss],
176  text: “The more that you read, the more things you will...}
177 2019-04-21 16:33:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/4/>
178 {author: J.K. Rowling,
179  tags: [dumbledore],
180  text: “Of course it is happening inside your head, Harry...}
181 2019-04-21 16:33:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/4/>
182 {author: Bob Marley,
183  tags: [friendship],
184  text: “The truth is, everyone is going to hurt you. You...}
185 2019-04-21 16:33:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/4/>
186 {author: Mother Teresa,
187  tags: [misattributed-to-mother-teresa, paraphrased],
188  text: “Not all of us can do great things. But we can do...}
189 2019-04-21 16:33:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/4/>
190 {author: J.K. Rowling,
191  tags: [death, inspirational],
192  text: “To the well-organized mind, death is but the next...}
193 2019-04-21 16:33:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/4/>
194 {author: Charles M. Schulz,
195  tags: [chocolate, food, humor],
196  text: “All you need is love. But a little chocolate now...}
197 2019-04-21 16:33:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/4/>
198 {author: William Nicholson,
199  tags: [misattributed-to-c-s-lewis, reading],
200  text: "“We read to know we‘re not alone.”"}
201 2019-04-21 16:33:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/4/>
202 {author: Albert Einstein,
203  tags: [knowledge, learning, understanding, wisdom],
204  text: “Any fool can know. The point is to understand.”}
205 2019-04-21 16:33:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/4/>
206 {author: Jorge Luis Borges,
207  tags: [books, library],
208  text: “I have always imagined that Paradise will be a ki...}
209 2019-04-21 16:33:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/4/>
210 {author: George Eliot,
211  tags: [inspirational],
212  text: “It is never too late to be what you might have be...}
213 2019-04-21 16:33:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/5/> (referer: http://quotes.toscrape.com/page/4/)
214 2019-04-21 16:33:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/5/>
215 {author: George R.R. Martin,
216  tags: [read, readers, reading, reading-books],
217  text: “A reader lives a thousand lives before he dies, s...}
218 2019-04-21 16:33:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/5/>
219 {author: C.S. Lewis,
220  tags: [books, inspirational, reading, tea],
221  text: “You can never get a cup of tea large enough or a...}
222 2019-04-21 16:33:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/5/>
223 {author: Marilyn Monroe,
224  tags: [],
225  text: “You believe lies so you eventually learn to trust...}
226 2019-04-21 16:33:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/5/>
227 {author: Marilyn Monroe,
228  tags: [girls, love],
229  text: “If you can make a woman laugh, you can make her d...}
230 2019-04-21 16:33:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/5/>
231 {author: Albert Einstein,
232  tags: [life, simile],
233  text: “Life is like riding a bicycle. To keep your balan...}
234 2019-04-21 16:33:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/5/>
235 {author: Marilyn Monroe,
236  tags: [love],
237  text: “The real lover is the man who can thrill you by k...}
238 2019-04-21 16:33:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/5/>
239 {author: Marilyn Monroe,
240  tags: [attributed-no-source],
241  text: "“A wise girl kisses but doesn‘t love, listens but..."}
242 2019-04-21 16:33:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/5/>
243 {author: Martin Luther King Jr.,
244  tags: [hope, inspirational],
245  text: “Only in the darkness can you see the stars.”}
246 2019-04-21 16:33:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/5/>
247 {author: J.K. Rowling,
248  tags: [dumbledore],
249  text: “It matters not what someone is born, but what the...}
250 2019-04-21 16:33:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/5/>
251 {author: James Baldwin,
252  tags: [love],
253  text: “Love does not begin and end the way we seem to th...}
254 2019-04-21 16:33:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/6/> (referer: http://quotes.toscrape.com/page/5/)
255 2019-04-21 16:33:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/6/>
256 {author: Jane Austen,
257  tags: [friendship, love],
258  text: “There is nothing I would not do for those who are...}
259 2019-04-21 16:33:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/6/>
260 {author: Eleanor Roosevelt,
261  tags: [attributed, fear, inspiration],
262  text: “Do one thing every day that scares you.”}
263 2019-04-21 16:33:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/6/>
264 {author: Marilyn Monroe,
265  tags: [attributed-no-source],
266  text: “I am good, but not an angel. I do sin, but I am n...}
267 2019-04-21 16:33:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/6/>
268 {author: Albert Einstein,
269  tags: [music],
270  text: “If I were not a physicist, I would probably be a...}
271 2019-04-21 16:33:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/6/>
272 {author: Haruki Murakami,
273  tags: [books, thought],
274  text: “If you only read the books that everyone else is...}
275 2019-04-21 16:33:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/6/>
276 {author: Alexandre Dumas fils,
277  tags: [misattributed-to-einstein],
278  text: “The difference between genius and stupidity is: g...}
279 2019-04-21 16:33:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/6/>
280 {author: Stephenie Meyer,
281  tags: [drug, romance, simile],
282  text: "“He‘s like a drug for you, Bella.”"}
283 2019-04-21 16:33:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/6/>
284 {author: Ernest Hemingway,
285  tags: [books, friends, novelist-quotes],
286  text: “There is no friend as loyal as a book.”}
287 2019-04-21 16:33:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/6/>
288 {author: Helen Keller,
289  tags: [inspirational],
290  text: “When one door of happiness closes, another opens;...}
291 2019-04-21 16:33:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/6/>
292 {author: George Bernard Shaw,
293  tags: [inspirational, life, yourself],
294  text: "“Life isn‘t about finding yourself. Life is about..."}
295 2019-04-21 16:33:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/7/> (referer: http://quotes.toscrape.com/page/6/)
296 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/7/>
297 {author: Charles Bukowski,
298  tags: [alcohol],
299  text: "“That‘s the problem with drinking, I thought, as I..."}
300 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/7/>
301 {author: Suzanne Collins,
302  tags: [the-hunger-games],
303  text: “You don’t forget the face of the person who was y...}
304 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/7/>
305 {author: Suzanne Collins,
306  tags: [humor],
307  text: "“Remember, we‘re madly in love, so it‘s all right..."}
308 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/7/>
309 {author: C.S. Lewis,
310  tags: [love],
311  text: “To love at all is to be vulnerable. Love anything...}
312 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/7/>
313 {author: J.R.R. Tolkien,
314  tags: [bilbo, journey, lost, quest, travel, wander],
315  text: “Not all those who wander are lost.”}
316 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/7/>
317 {author: J.K. Rowling,
318  tags: [live-death-love],
319  text: “Do not pity the dead, Harry. Pity the living, and...}
320 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/7/>
321 {author: Ernest Hemingway,
322  tags: [good, writing],
323  text: “There is nothing to writing. All you do is sit do...}
324 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/7/>
325 {author: Ralph Waldo Emerson,
326  tags: [life, regrets],
327  text: “Finish each day and be done with it. You have don...}
328 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/7/>
329 {author: Mark Twain,
330  tags: [education],
331  text: “I have never let my schooling interfere with my e...}
332 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/7/>
333 {author: Dr. Seuss,
334  tags: [troubles],
335  text: “I have heard there are troubles of more than one...}
336 2019-04-21 16:33:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/8/> (referer: http://quotes.toscrape.com/page/7/)
337 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/8/>
338 {author: Alfred Tennyson,
339  tags: [friendship, love],
340  text: “If I had a flower for every time I thought of you...}
341 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/8/>
342 {author: Charles Bukowski,
343  tags: [humor],
344  text: “Some people never go crazy. What truly horrible l...}
345 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/8/>
346 {author: Terry Pratchett,
347  tags: [humor, open-mind, thinking],
348  text: “The trouble with having an open mind, of course,...}
349 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/8/>
350 {author: Dr. Seuss,
351  tags: [humor, philosophy],
352  text: “Think left and think right and think low and thin...}
353 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/8/>
354 {author: J.D. Salinger,
355  tags: [authors, books, literature, reading, writing],
356  text: “What really knocks me out is a book that, when yo...}
357 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/8/>
358 {author: George Carlin,
359  tags: [humor, insanity, lies, lying, self-indulgence, truth],
360  text: “The reason I talk to myself is because I’m the on...}
361 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/8/>
362 {author: John Lennon,
363  tags: [beatles,
364           connection,
365           dreamers,
366           dreaming,
367           dreams,
368           hope,
369           inspirational,
370           peace],
371  text: "“You may say I‘m a dreamer, but I‘m not the only o..."}
372 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/8/>
373 {author: W.C. Fields,
374  tags: [humor, sinister],
375  text: “I am free of all prejudice. I hate everyone equal...}
376 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/8/>
377 {author: Ayn Rand,
378  tags: [],
379  text: "“The question isn‘t who is going to let me; it‘s w..."}
380 2019-04-21 16:33:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/8/>
381 {author: Mark Twain,
382  tags: [books, classic, reading],
383  text: "“′Classic′ - a book which people praise and don‘t..."}
384 2019-04-21 16:33:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/9/> (referer: http://quotes.toscrape.com/page/8/)
385 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/9/>
386 {author: Albert Einstein,
387  tags: [mistakes],
388  text: “Anyone who has never made a mistake has never tri...}
389 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/9/>
390 {author: Jane Austen,
391  tags: [humor, love, romantic, women],
392  text: "“A lady‘s imagination is very rapid; it jumps from..."}
393 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/9/>
394 {author: J.K. Rowling,
395  tags: [integrity],
396  text: “Remember, if the time should come when you have t...}
397 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/9/>
398 {author: Jane Austen,
399  tags: [books, library, reading],
400  text: “I declare after all there is no enjoyment like re...}
401 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/9/>
402 {author: Jane Austen,
403  tags: [elizabeth-bennet, jane-austen],
404  text: “There are few people whom I really love, and stil...}
405 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/9/>
406 {author: C.S. Lewis,
407  tags: [age, fairytales, growing-up],
408  text: “Some day you will be old enough to start reading...}
409 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/9/>
410 {author: C.S. Lewis,
411  tags: [god],
412  text: “We are not necessarily doubting that God will do...}
413 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/9/>
414 {author: Mark Twain,
415  tags: [death, life],
416  text: “The fear of death follows from the fear of life....}
417 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/9/>
418 {author: Mark Twain,
419  tags: [misattributed-mark-twain, truth],
420  text: “A lie can travel half way around the world while...}
421 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/9/>
422 {author: C.S. Lewis,
423  tags: [christianity, faith, religion, sun],
424  text: “I believe in Christianity as I believe that the s...}
425 2019-04-21 16:33:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/10/> (referer: http://quotes.toscrape.com/page/9/)
426 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/10/>
427 {author: J.K. Rowling,
428  tags: [truth],
429  text: “The truth." Dumbledore sighed. "It is a beautiful...}
430 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/10/>
431 {author: Jimi Hendrix,
432  tags: [death, life],
433  text: "“I‘m the one that‘s got to die when it‘s time for..."}
434 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/10/>
435 {author: J.M. Barrie,
436  tags: [adventure, love],
437  text: “To die will be an awfully big adventure.”}
438 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/10/>
439 {author: E.E. Cummings,
440  tags: [courage],
441  text: “It takes courage to grow up and become who you re...}
442 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/10/>
443 {author: Khaled Hosseini,
444  tags: [life],
445  text: “But better to get hurt by the truth than comforte...}
446 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/10/>
447 {author: Harper Lee,
448  tags: [better-life-empathy],
449  text: “You never really understand a person until you co...}
450 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/10/>
451 {author: "Madeleine L‘Engle",
452  tags: [books,
453           children,
454           difficult,
455           grown-ups,
456           write,
457           writers,
458           writing],
459  text: “You have to write the book that wants to be writt...}
460 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/10/>
461 {author: Mark Twain,
462  tags: [truth],
463  text: “Never tell the truth to people who are not worthy...}
464 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/10/>
465 {author: Dr. Seuss,
466  tags: [inspirational],
467  text: "“A person‘s a person, no matter how small.”"}
468 2019-04-21 16:33:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/10/>
469 {author: George R.R. Martin,
470  tags: [books, mind],
471  text: “... a mind needs books as a sword needs a whetsto...}
472 2019-04-21 16:33:37 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET http://quotes.toscrape.com/page/10/> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplica
473 tes)
474 2019-04-21 16:33:37 [scrapy.core.engine] INFO: Closing spider (finished)
475 2019-04-21 16:33:37 [scrapy.extensions.feedexport] INFO: Stored marshal feed (100 items) in: quotes.marshal
476 2019-04-21 16:33:37 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
477 {downloader/request_bytes: 3402,
478  downloader/request_count: 10,
479  downloader/request_method_count/GET: 10,
480  downloader/response_bytes: 24444,
481  downloader/response_count: 10,
482  downloader/response_status_count/200: 10,
483  dupefilter/filtered: 1,
484  finish_reason: finished,
485  finish_time: datetime.datetime(2019, 4, 21, 8, 33, 37, 793480),
486  item_scraped_count: 100,
487  log_count/DEBUG: 111,
488  log_count/INFO: 10,
489  request_depth_max: 10,
490  response_received_count: 10,
491  scheduler/dequeued: 10,
492  scheduler/dequeued/memory: 10,
493  scheduler/enqueued: 10,
494  scheduler/enqueued/memory: 10,
495  start_time: datetime.datetime(2019, 4, 21, 8, 33, 12, 225321)}
496 2019-04-21 16:33:37 [scrapy.core.engine] INFO: Spider closed (finished)
View Code

 

quotes 整站数据爬取存mongo

原文:https://www.cnblogs.com/jackzz/p/10745795.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!