python文本处理

时间：2019-03-12 21:40:44 阅读：141 评论：0 收藏：0 [点我收藏+]

1.在文本提取URL

这个主要用于爬虫技术：

把爬取的html页面保存为一个字符串，再从字符串中进行提取URL

比如把一个字符串保存在文件中

Now a days you can learn almost anything by just visiting http://www.google.com. But if you are completely new to computers or internet then first you need to leanr those fundamentals. Next
you can visit a good e-learning site like - https://www.codingdict.com to learn further on a variety of subjects.

然后使用findall()函数进行查找和正则表达式有关的实例。
import re

with open("path\url_example.txt") as file:
        for line in file:
            urls = re.findall(‘https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+‘, line)
            print(urls)

python文本处理

原文：https://www.cnblogs.com/qiujichu/p/10519802.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)