爬虫：单线程+多任务异步协程

时间：2020-04-13 23:00:48 阅读：95 评论：0 收藏：0 [点我收藏+]

单线程+多任务异步协程

协程
- 在函数（特殊的函数）定义的时候，如果使用了async修饰的话，则改函数调用后会返回一个协程对象，并且函数内部的实现语句不会被立即执行
任务对象
- 任务对象就是对协程对象的进一步封装。任务对象==高级的协程对象==特殊的函数
- 任务对象时必须要注册到事件循环对象中
- 给任务对象绑定回调：爬虫的数据解析中
事件循环
- 当做是一个容器，容器中必须存放任务对象。
- 当启动事件循环对象后，则事件循环对象会对其内部存储任务对象进行异步的执行。
aiohttp:支持异步网络请求的模块

实例：

服务器端代码：

from flask import Flask
import time

app = Flask(__name__)

@app.route(‘/index1‘)
def index_bobo():
    time.sleep(2)
    return ‘Hello index1‘

@app.route(‘/index2‘)
def index_jay():
    time.sleep(2)
    return ‘Hello index2‘

@app.route(‘/index3‘)
def index_tom():
    time.sleep(2)
    return ‘Hello index3‘

if __name__ == ‘__main__‘:
    app.run(threaded=True)

爬虫应用：

import time
import asyncio
s = time.time()
urls = [
    ‘http://127.0.0.1:5000/index1‘,
    ‘http://127.0.0.1:5000/index2‘,
    ‘http://127.0.0.1:5000/index3‘,
]

#在特殊函数内部的实现中不可以出现不支持异步的模块代码
async def get_request(url):
    #aiohttp:支持异步网络请求的模块
   async with aiohttp.ClientSession() as s:     #创建一个aiohttp的ClientSession对象
       async with await s.get(url=url) as response:       #发起aio请求
           page_text = await response.text()
           print(page_text)
   return page_text
tasks = []
for url in urls:
    c = get_request(url)            #协程对象
    task = asyncio.ensure_future(c)     #将协程对象进一步封装，即是任务对象 ，任务对象==高级的协程对象==特殊的函数
    tasks.append(task)

loop = asyncio.get_event_loop() #创建事件循环对象
#注意：挂起操作需要手动处理
loop.run_until_complete(asyncio.wait(tasks))  #事件循环对象支持异步，将任务对象添加到事件循环中，启动事件循环，
                                                # 对任务对象进行异步执行

print(time.time()-s)

附加：

基于线程池实现异步爬虫：

from multiprocessing.dummy import Pool
import requests
import time
start = time.time()
urls = [
    ‘http://localhost:5000/index1‘,
    ‘http://localhost:5000/index‘,

]
def get_request(url):
    page_text = requests.get(url).text
    print(page_text)

pool = Pool(5)
pool.map(get_request, urls)

print(‘总耗时：‘, time.time() - start)

爬虫：单线程+多任务异步协程

原文：https://www.cnblogs.com/dylan123/p/12694711.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)