首页 > 其他 > 详细

36.爬取柯林斯字字典

时间:2020-04-14 21:33:19      阅读:62      评论:0      收藏:0      [点我收藏+]

爬取柯林斯字字典:

# 关于线程以及进程的使用
# https://www.cnblogs.com/dylan9/p/9207366.html
# 关于进程池的使用
# https://www.cnblogs.com/huchong/p/7459324.html#_lab2_1_0
import time

import requests
from lxml import etree
from multiprocessing.dummy import Pool
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
}

# url = "https://www.collinsdictionary.com/zh/browse/english/"
#
# page_text = requests.get(url=url, headers=headers).text
#
# tree = etree.HTML(page_text)
#
# li_list = tree.xpath("//ul[@class=‘bLtr‘]/li/a/@href")[1:]
pool = Pool(20)

li_list = [‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-a‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-b‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-c‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-d‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-e‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-f‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-g‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-h‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-i‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-j‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-k‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-l‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-m‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-n‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-o‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-p‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-q‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-r‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-s‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-t‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-u‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-v‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-w‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-x‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-y‘, ‘https://www.collinsdictionary.com/zh/browse/english/words-starting-with-z‘]

# li_list = ["https://www.collinsdictionary.com/zh/browse/english/words-starting-with-a"]

deep_url_list = []

start = time.time()

def get_urls(url):
    page_text2 = requests.get(url=url, headers=headers).text
    tree2 = etree.HTML(page_text2)
    url_list = tree2.xpath("//ul[@class=‘columns2 bL‘]/li/a/@href")
    deep_url_list.extend(url_list)


def get_data(url):
    page_text3 = requests.get(url=url, headers=headers).text
    tree3 = etree.HTML(page_text3)
    data_li_list = tree3.xpath("//ul[@class=‘columns2 bL‘]/li")
    for li in data_li_list:
        data = li.xpath(‘./a/text()‘)[0]
        with open("word2.txt", "a", encoding="utf-8") as f:
            f.write(data + ‘\n‘)


pool.map(get_urls, li_list)
result = pool.map_async(get_data, deep_url_list)
result.wait()
print("执行完毕")
print("耗时:", time.time()-start)

36.爬取柯林斯字字典

原文:https://www.cnblogs.com/liuzhanghao/p/12700889.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!