lxml

时间：2021-04-27 19:50:15 阅读：16 评论：0 收藏：0 [点我收藏+]

lxml

简述：lxml是一个python库，用来处理xml和html文件，还可以用来web爬取数据

安装：

使用命令：pip install lxml

MacOS或者Linux系统：sudo apt-get install python-lxml

以上不行则尝试使用：easy_install lxml

使用：

首先需要导入模块

#从lxml库中导入etree模块
from lxml import etree

html解析：

etree.html方法可以把html文本内容解析成html对象，并可以自动修正格式

创建html/xml文档

可以使用etree模块中的Element方法创建元素

#首先创建根元素
root = etree.Element(‘html‘,vertion="5.0")

#在根元素下建子元素
etree.SubElement(root,‘head‘)
etree.SubElement(root,‘title‘,bgcolor=‘red‘,fontsize=‘22‘)
etree.SubElement(root,‘body‘,fontsize=‘15‘)

#子元素创建元素，在body标签中建p标签，然后在p标签中建a标签
etree.SubElement(root[2],‘p‘,bgcolor="red")
etree.SubElement(root[2][0],‘a‘)

#打印结果，pretty_print参数为True表示以html标准格式输出
print (etree.tostring(root,pretty_print=True).decode(‘utf-8‘))

解析HTML、XML文档

前面是创建元素以及添加属性，如果我们想从一个已经创建好的html、xml文件中解析它提取内容，则可以

#遍历根元素下的子元素，并且打印出标签
for t in root:
        #打印标签
    print (t.tag)

#给元素添加属性
root.set(‘newAttribute‘,‘attributeValue‘)
#获取元素的属性值
root.get(‘newAttribute‘)
root[1].get(‘bgcolor‘)
#元素下添加文本信息
root[0].text = "this is the head"
root[1].text = "this is the title"
root[2][0].text = "this is the subtag p the body"
root[2][0][0].text = "this is the subtag a the p"

#检查元素的某个节点是否是一个元素
etree.iselemen(root[o])
#检查元素是否有父元素
root.getparent()
root[0].getparent()
#检查同胞元素
root.getnext()
root[0].getnext() #输出：title，返回的是本身的下一个同胞元素
root[1].getprevious() #输出：head,返回的是本身的上一个同胞元素
#寻找元素
root.find(‘head‘.tag)

总结：

以上写到了，lxml的作用，以及使用它来处理html、xml文档来创建元素，添加属性，并且还可以解析已经创建好的html、xml文档来获取内容

参考：

https://python.freelycode.com/contribution/detail/1532

lxml

原文：https://www.cnblogs.com/bcCai/p/14709898.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)