scrapy Selector

时间：2020-09-26 18:10:26 阅读：41 评论：0 收藏：0 [点我收藏+]

Selector 是一个可以独立使用的模块。我们可以直接利用 Selector 这个类来构建一个选择器对象，然后调用它的相关方法如 xpath、css 等来提取数据。

例如，针对一段 HTML 代码，我们可以用如下方式构建 Selector 对象来提取数据：

复制
from scrapy import Selector
?
body = ‘<html><head><title>Hello World</title></head><body></body></html>‘
selector = Selector(text=body)
title = selector.xpath(‘//title/text()‘).extract_first()
print(title)
运行结果：

复制
Hello World

值得注意的是，response 对象不能直接调用 re 和 re_first 方法。如果想要对全文进行正则匹配，可以先调用 xpath 方法然后再进行正则匹配，如下所示：

复制
>>> response.re(‘Name:\s(.*)‘)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
AttributeError: ‘HtmlResponse‘ object has no attribute ‘re‘
>>> response.xpath(‘.‘).re(‘Name:\s(.*)<br>‘)
[‘My image 1 ‘, ‘My image 2 ‘, ‘My image 3 ‘, ‘My image 4 ‘, ‘My image 5 ‘]
>>> response.xpath(‘.‘).re_first(‘Name:\s(.*)<br>‘)
‘My image 1 ‘
通过上面的例子，我们可以看到，直接调用 re 方法会提示没有 re 属性。但是这里首先调用了 xpath(‘.‘)选中全文，然后调用 re 和 re_first 方法，就可以进行正则匹配了。

scrapy Selector

原文：https://www.cnblogs.com/angdh/p/13734791.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)