Website Scraping with Python 阅读笔记

时间：2019-01-09 00:09:41 阅读：177 评论：0 收藏：0 [点我收藏+]

第一章

　　工程涉及的基本工具：requests, beautiful soup, scrapy。

　　法规与技术约定：read the Terms & Conditions and the Privacy Policy of the website。让不让爬？
　　　　　　　　　　See the robots.txt file 。哪些可以爬？
　　　　　　　　　　website’s HTML code。目标网页涉及什么技术？

　　　　　　　　　　taskand the website‘s structure.。该选什么工具？

　　Terms and Robots重点读：scraper/scraping
　　　　　　　　　　　　　　　 crawler/crawling
　　　　　　　　　　　　　　 bot
　　　　　　　　　　　　　　 spider
　　　　　　　　　　　　　　　 program

　　网页技术：使用python的builtwith库探查网页使用的技术

　　谷歌浏览器开发者工具：勘察网页

　　工具选择：small project（简单页面、没有涉及js的） Beautiful Soup + requests or use Scrapy。

　　　　　　　有大量数据的，追求性能的 Scrapy + Beautiful Soup。

　　　　　　　面对AJAX技术就要打电话摇人了，Selenium and Portia 出场。

Website Scraping with Python 阅读笔记

原文：https://www.cnblogs.com/roygood/p/10242010.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)