WebCollector 2.x官网和镜像:
官网:https://github.com/CrawlScript/WebCollector
镜像:http://git.oschina.net/webcollector/WebCollector
WebCollector 2.x教程:
WebCollector 2.x tutorial 2 (BreadthCrawler中文教程)
WebCollector 2.x 新闻网页正文自动提取算法
WebCollector 2.x 抽取器 (Extractor和MultiExtractorCrawler)
WebCollector爬取JS生成数据
WebCollector爬取搜狗搜索(分页)
WebCollector爬取JSON数据
使用SoupLang脚本同时管理多个页面爬取 SoupLang脚本
用WebCollector 2.x爬取新浪微博(无需手动获取cookie)
WebCollector 2.x教程(镜像):
WebCollector
原文:http://my.oschina.net/u/1579617/blog/520524