首页 > 编程语言 > 详细

python sax解析xml

时间:2016-01-16 12:01:38      阅读:150      评论:0      收藏:0      [点我收藏+]
#books.xml
<
catalog> <book isbn="0-596-00128-2"> <title>Python &amp; XML</title> <title>Python &amp; HTML</title> <date>December 2001</date> <author>Jones, Drake</author> </book> <book isbn="0-596-15810-6"> <title>Programming Python, 4th Edition</title> <date>October 2010</date> <author>Lutz</author> </book> <book isbn="0-596-15806-8"> <title>Learning Python, 4th Edition</title> <date>September 2009</date> <author>Lutz</author> </book> <book isbn="0-596-15808-4"> <title>Python Pocket Reference, 4th Edition</title> <date>October 2009</date> <author>Lutz</author> </book> <book isbn="0-596-00797-3"> <title>Python Cookbook, 2nd Edition</title> <date>March 2005</date> <author>Martelli, Ravenscroft, Ascher</author> </book> <book isbn="0-596-10046-9"> <title>Python in a Nutshell, 2nd Edition</title> <date>July 2006</date> <author>Martelli</author> </book> <!-- plus many more Python books that should appear here --> </catalog>

 

 

#conding:utf-8
# -*- coding:utf-8 -*-
__author__ = hdfs
‘‘‘
总的来说 sax解析xml 进行3个阶段 sax是线性解析对于大的xml会很有效率
‘‘‘
import xml.sax,xml.sax.handler,pprint
class BookHandler(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.inTitle=False
        self.mapping={}

    def startElement(self, name, attrs):
        #book标签开始
        if name=="book":
            self.buffer=""
            self.isbn=attrs["isbn"]
        #title标签开始
        elif name=="title":
            self.inTitle=True

    def characters(self,data):
        #如果真的进入buffer 关联多个子节点的数据
        if self.inTitle:
            self.buffer+=data
    #结束一个元素的遍历
    def endElement(self,name):
        if name=="title":
            self.inTitle=False
            self.mapping[self.isbn]=self.buffer

parser=xml.sax.make_parser()
handler=BookHandler()
parser.setContentHandler(handler)
parser.parse(books.xml)
pprint.pprint(handler.mapping)

 

result:

{u‘0-596-00128-2‘: u‘Python & XMLPython & HTML‘,
 u‘0-596-00797-3‘: u‘Python Cookbook, 2nd Edition‘,
 u‘0-596-10046-9‘: u‘Python in a Nutshell, 2nd Edition‘,
 u‘0-596-15806-8‘: u‘Learning Python, 4th Edition‘,
 u‘0-596-15808-4‘: u‘Python Pocket Reference, 4th Edition‘,
 u‘0-596-15810-6‘: u‘Programming Python, 4th Edition‘}

 

python sax解析xml

原文:http://www.cnblogs.com/similarface/p/5135161.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!