python scraping webs - python取得NIPS oral paper列表

时间：2019-12-13 14:29:11 阅读：96 评论：0 收藏：0 [点我收藏+]

 1 from lxml import html
 2 import requests
 3 
 4 # using xpath
 5 
 6 # page = requests.get(‘http://econpy.pythonanywhere.com/ex/001.html‘)
 7 page = requests.get(‘https://nips.cc/Conferences/2019/Schedule‘)
 8 tree = html.fromstring(page.content)
 9 
10 #This will create a list of buyers:
11 # buyers = tree.xpath(‘//div[@title="buyer-name"]/text()‘)
12 # test = tree.xpath(‘//*[@id="maincard_15788"]/div[3]‘)
13 # print(test)
14 
15 
16 
17 doc = tree
18 # btags = doc.xpath("//*[@class[starts-with(., ‘maincard narrower Oral‘) and string-length() > 3]]")
19 btags = doc.xpath("//*[@class[starts-with(., ‘maincard narrower Spotlight‘) and string-length() > 3]]")
20 idx = 1
21 with open(‘nips_paperlist_spotlight.txt‘, ‘w‘) as f:
22     for b in btags:
23         type = b.xpath("div[1]")[0].text
24         title = b.xpath("div[3]")[0].text
25         author = b.xpath("div[5]")[0].text
26         out_str = "%d, %s, %s, %s\n"%(idx, type,  title, author)
27         print(out_str)
28         f.writelines(out_str)
29         # print(idx)
30         # print(type)
31         # print(title)
32         # print(author)
33         idx += 1

使用XPath

lxml, requests

https://docs.python-guide.org/scenarios/scrape/

https://stackoverflow.com/questions/12393858/xpath-using-contains-with-a-wildcard

python scraping webs - python取得NIPS oral paper列表

原文：https://www.cnblogs.com/imoon22/p/12034855.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)