Python之xml文件处理(一)——使用ElementTree遍历xml

Stella981
• 阅读 714

发现python上有关xml的实现方法还是蛮多的,第三方的框架也不少,但是其中没有像dom4j那样名声响亮的框架。所以,还是中规中居地介绍python的故有方法吧。

今天,先介绍一个ElementTree,看着还顺眼些。详细说明在代码注释中,结果在后面。

#coding:utf-8
'''
Created on 2015年1月12日

@author: neil
'''
from xml.etree import ElementTree

#xml内容以文件形式传入
#filepath=''
# with open(filepath,'rt') as f:
#     tree=ElementTree.parse(f)

#xml内容以字符串形式传入
tree=ElementTree.fromstring('''<?xml version="1.0" encoding="utf-8"?>
<catalog>
       <maxid id='0'>4</maxid>尾巴
       <login id='1' username="pytest" passwd='123456'>
              <caption>Python</caption>
             <item id="4">
                    <caption>测试</caption>
            </item>
    </login>
    <item id="2">
            <caption>Zope</caption>
            <caption>JJJ</caption>
    </item>
<mux>sdfsdfsdfs</mux>
</catalog>''')

for node in tree.iter():#深度搜索、树的先序遍历
    print type(node)
    print 'tag:',node.tag#元素标签名
    print 'tail:',node.tail#</XXX>后面的字符串,多半是回车换行。根结点的结束标签后面文档结束,所以是None。
    print 'attrib:',node.attrib#字典类型,字典的项对应元素属性,字典的键对应属性名称,字典的值对应属性值
    print 'attrib-id-default:',node.attrib.get('id','NULL')#字典方法,按键取值,如果键不存在,就取默认值。默认值由第二个参数规定。
    print 'attrib-id:',node.attrib.get('id')#字典方法,按键取值,如果键不存在,就取None,不会报错。
    print 'attrib.items:',node.attrib.items()#二元元组的列表,列表的每一个项是一个二元元组,元组的前一个值是xml元素属性的名称,后一个值是属性的值。
    print 'attrib.keys:',node.attrib.keys()#以列表的形式列举元素的所有属性名称
    print 'find-caption:',node.find('caption')#只在该结点的所有子节点中选择符合元素名称的第一个子节点
    print 'find-item:',node.find('item')
    print 'findall:',node.findall('caption')#只在该结点的所有子节点中选择符合元素名称的所有子节点,范围类型为列表
    print 'findtext-caption:',node.findtext('caption')#只在该结点的所有子节点中选择符合元素内容的第一个子节点
    print 'getchildren:',node.getchildren() #所有子节点以列表形式给出
    print 'iter:',node.iter()
    for n in node.iter():#深度搜索、树的先序遍历该结点下的子树
        print '++++',n.tag
    print '- - - - - - - - - - - - - - - - '

输出结果如下:

<class 'xml.etree.ElementTree.Element'>
tag: catalog
tail: None
attrib: {}
attrib-id-default: NULL
attrib-id: None
attrib.items: []
attrib.keys: []
find-caption: None
find-item: <Element 'item' at 0x7f738e1f1c10>
findall: []
findtext-caption: None
getchildren: [<Element 'maxid' at 0x7f738e1f1850>, <Element 'login' at 0x7f738e1f1890>, <Element 'item' at 0x7f738e1f1c10>, <Element 'mux' at 0x7f738e1f1cd0>]
iter: <generator object iter at 0x7f738e1e72d0>
++++ catalog
++++ maxid
++++ login
++++ caption
++++ item
++++ caption
++++ item
++++ caption
++++ caption
++++ mux
- - - - - - - - - - - - - - - - 
<class 'xml.etree.ElementTree.Element'>
tag: maxid
tail: 尾巴
       
attrib: {'id': '0'}
attrib-id-default: 0
attrib-id: 0
attrib.items: [('id', '0')]
attrib.keys: ['id']
find-caption: None
find-item: None
findall: []
findtext-caption: None
getchildren: []
iter: <generator object iter at 0x7f738e1e7370>
++++ maxid
- - - - - - - - - - - - - - - - 
<class 'xml.etree.ElementTree.Element'>
tag: login
tail: 
    
attrib: {'username': 'pytest', 'passwd': '123456', 'id': '1'}
attrib-id-default: 1
attrib-id: 1
attrib.items: [('username', 'pytest'), ('passwd', '123456'), ('id', '1')]
attrib.keys: ['username', 'passwd', 'id']
find-caption: <Element 'caption' at 0x7f738e1f1950>
find-item: <Element 'item' at 0x7f738e1f1a10>
findall: [<Element 'caption' at 0x7f738e1f1950>]
findtext-caption: Python
getchildren: [<Element 'caption' at 0x7f738e1f1950>, <Element 'item' at 0x7f738e1f1a10>]
iter: <generator object iter at 0x7f738e1e7370>
++++ login
++++ caption
++++ item
++++ caption
- - - - - - - - - - - - - - - - 
<class 'xml.etree.ElementTree.Element'>
tag: caption
tail: 
             
attrib: {}
attrib-id-default: NULL
attrib-id: None
attrib.items: []
attrib.keys: []
find-caption: None
find-item: None
findall: []
findtext-caption: None
getchildren: []
iter: <generator object iter at 0x7f738e1e73c0>
++++ caption
- - - - - - - - - - - - - - - - 
<class 'xml.etree.ElementTree.Element'>
tag: item
tail: 
    
attrib: {'id': '4'}
attrib-id-default: 4
attrib-id: 4
attrib.items: [('id', '4')]
attrib.keys: ['id']
find-caption: <Element 'caption' at 0x7f738e1f1bd0>
find-item: None
findall: [<Element 'caption' at 0x7f738e1f1bd0>]
findtext-caption: 测试
getchildren: [<Element 'caption' at 0x7f738e1f1bd0>]
iter: <generator object iter at 0x7f738e1e73c0>
++++ item
++++ caption
- - - - - - - - - - - - - - - - 
<class 'xml.etree.ElementTree.Element'>
tag: caption
tail: 
            
attrib: {}
attrib-id-default: NULL
attrib-id: None
attrib.items: []
attrib.keys: []
find-caption: None
find-item: None
findall: []
findtext-caption: None
getchildren: []
iter: <generator object iter at 0x7f738e1e7410>
++++ caption
- - - - - - - - - - - - - - - - 
<class 'xml.etree.ElementTree.Element'>
tag: item
tail: 

attrib: {'id': '2'}
attrib-id-default: 2
attrib-id: 2
attrib.items: [('id', '2')]
attrib.keys: ['id']
find-caption: <Element 'caption' at 0x7f738e1f1c50>
find-item: None
findall: [<Element 'caption' at 0x7f738e1f1c50>, <Element 'caption' at 0x7f738e1f1c90>]
findtext-caption: Zope
getchildren: [<Element 'caption' at 0x7f738e1f1c50>, <Element 'caption' at 0x7f738e1f1c90>]
iter: <generator object iter at 0x7f738e1e7370>
++++ item
++++ caption
++++ caption
- - - - - - - - - - - - - - - - 
<class 'xml.etree.ElementTree.Element'>
tag: caption
tail: 
            
attrib: {}
attrib-id-default: NULL
attrib-id: None
attrib.items: []
attrib.keys: []
find-caption: None
find-item: None
findall: []
findtext-caption: None
getchildren: []
iter: <generator object iter at 0x7f738e1e73c0>
++++ caption
- - - - - - - - - - - - - - - - 
<class 'xml.etree.ElementTree.Element'>
tag: caption
tail: 
    
attrib: {}
attrib-id-default: NULL
attrib-id: None
attrib.items: []
attrib.keys: []
find-caption: None
find-item: None
findall: []
findtext-caption: None
getchildren: []
iter: <generator object iter at 0x7f738e1e73c0>
++++ caption
- - - - - - - - - - - - - - - - 
<class 'xml.etree.ElementTree.Element'>
tag: mux
tail: 

attrib: {}
attrib-id-default: NULL
attrib-id: None
attrib.items: []
attrib.keys: []
find-caption: None
find-item: None
findall: []
findtext-caption: None
getchildren: []
iter: <generator object iter at 0x7f738e1e7370>
++++ mux
- - - - - - - - - - - - - - - -

Element具有的属性和方法:

  • tag

  • A string identifying what kind of data this element represents (the element type, in other words).

  • text

  • The text attribute can be used to hold additional data associated with the element. As the name implies this attribute is usually a string but may be any application-specific object. If the element is created from an XML file the attribute will contain any text found between the element tags.

  • tail

  • The tail attribute can be used to hold additional data associated with the element. This attribute is usually a string but may be any application-specific object. If the element is created from an XML file the attribute will contain any text found after the element’s end tag and before the next tag.

  • attrib

  • A dictionary containing the element’s attributes. Note that while the attrib value is always a real mutable Python dictionary, an ElementTree implementation may choose to use another internal representation, and create the dictionary only if someone asks for it. To take advantage of such implementations, use the dictionary methods below whenever possible.

The following dictionary-like methods work on the element attributes.

  • clear()

  • Resets an element. This function removes all subelements, clears all attributes, and sets the text and tail attributes to None.

  • get(key,default=None)

  • Gets the element attribute named key.

    Returns the attribute value, or default if the attribute was not found.

  • items()

  • Returns the element attributes as a sequence of (name, value) pairs. The attributes are returned in an arbitrary order.

  • keys()

  • Returns the elements attribute names as a list. The names are returned in an arbitrary order.

  • set(key,value)

  • Set the attribute key on the element to value.

The following methods work on the element’s children (subelements).

  • append(subelement)

  • Adds the element subelement to the end of this elements internal list of subelements.

  • extend(subelements)

  • Appends subelements from a sequence object with zero or more elements. RaisesAssertionError if a subelement is not a valid object.

    New in version 2.7.

  • find(match)

  • Finds the first subelement matching matchmatch may be a tag name or path. Returns an element instance orNone.

  • findall(match)

  • Finds all matching subelements, by tag name or path. Returns a list containing all matching elements in document order.

  • findtext(match,default=None)

  • Finds text for the first subelement matching matchmatch may be a tag name or path. Returns the text content of the first matching element, or_default_ if no element was found. Note that if the matching element has no text content an empty string is returned.

  • getchildren()

  • Deprecated since version 2.7:Uselist(elem) or iteration.

  • getiterator(tag=None)

  • Deprecated since version 2.7:Use method Element.iter() instead.

  • insert(index,element)

  • Inserts a subelement at the given position in this element.

  • iter(tag=None)

  • Creates a tree iterator with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order. If tag is not None or '*', only elements whose tag equals tag are returned from the iterator. If the tree structure is modified during iteration, the result is undefined.

  • iterfind(match)

  • Finds all matching subelements, by tag name or path. Returns an iterable yielding all matching elements in document order.

    New in version 2.7.

  • itertext()

  • Creates a text iterator. The iterator loops over this element and all subelements, in document order, and returns all inner text.

    New in version 2.7.

  • makeelement(tag,attrib)

  • Creates a new element object of the same type as this element. Do not call this method, use the SubElement() factory function instead.

  • remove(subelement)

  • Removes subelement from the element. Unlike the find* methods this method compares elements based on the instance identity, not on tag value or contents.

ElementTree具有的属性和方法:

  • _setroot(element)

  • Replaces the root element for this tree. This discards the current contents of the tree, and replaces it with the given element. Use with care.element is an element instance.

  • find(match)

  • Finds the first toplevel element matching matchmatch may be a tag name or path. Same as getroot().find(match). Returns the first matching element, orNone if no element was found.

  • findall(match)

  • Finds all matching subelements, by tag name or path. Same as getroot().findall(match).match may be a tag name or path. Returns a list containing all matching elements, in document order.

  • findtext(match,default=None)

  • Finds the element text for the first toplevel element with given tag. Same as getroot().findtext(match).match may be a tag name or path.default is the value to return if the element was not found. Returns the text content of the first matching element, or the default value no element was found. Note that if the element is found, but has no text content, this method returns an empty string.

  • getiterator(tag=None)

  • Deprecated since version 2.7:Use method ElementTree.iter() instead.

  • getroot()

  • Returns the root element for this tree.

  • iter(tag=None)

  • Creates and returns a tree iterator for the root element. The iterator loops over all elements in this tree, in section order.tag is the tag to look for (default is to return all elements)

  • iterfind(match)

  • Finds all matching subelements, by tag name or path. Same as getroot().iterfind(match). Returns an iterable yielding all matching elements in document order.

    New in version 2.7.

  • parse(source,parser=None)

  • Loads an external XML section into this element tree. source is a file name or file object.parser is an optional parser instance. If not given, the standard XMLParser parser is used. Returns the section root element.

  • write(file,encoding="us-ascii",xml_declaration=None,method="xml")

  • Writes the element tree to a file, as XML. file is a file name, or a file object opened for writing.encoding[1] is the output encoding (default is US-ASCII).xml_declaration controls if an XML declaration should be added to the file. Use False for never, True for always, None for only if not US-ASCII or UTF-8 (default is None).method is either"xml","html" or"text" (default is"xml"). Returns an encoded string.

点赞
收藏
评论区
推荐文章
blmius blmius
3年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
皕杰报表之UUID
​在我们用皕杰报表工具设计填报报表时,如何在新增行里自动增加id呢?能新增整数排序id吗?目前可以在新增行里自动增加id,但只能用uuid函数增加UUID编码,不能新增整数排序id。uuid函数说明:获取一个UUID,可以在填报表中用来创建数据ID语法:uuid()或uuid(sep)参数说明:sep布尔值,生成的uuid中是否包含分隔符'',缺省为
待兔 待兔
4个月前
手写Java HashMap源码
HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程22
Jacquelyn38 Jacquelyn38
3年前
2020年前端实用代码段,为你的工作保驾护航
有空的时候,自己总结了几个代码段,在开发中也经常使用,谢谢。1、使用解构获取json数据let jsonData  id: 1,status: "OK",data: 'a', 'b';let  id, status, data: number   jsonData;console.log(id, status, number )
Stella981 Stella981
3年前
Python3:sqlalchemy对mysql数据库操作,非sql语句
Python3:sqlalchemy对mysql数据库操作,非sql语句python3authorlizmdatetime2018020110:00:00coding:utf8'''
Wesley13 Wesley13
3年前
mysql设置时区
mysql设置时区mysql\_query("SETtime\_zone'8:00'")ordie('时区设置失败,请联系管理员!');中国在东8区所以加8方法二:selectcount(user\_id)asdevice,CONVERT\_TZ(FROM\_UNIXTIME(reg\_time),'08:00','0
Wesley13 Wesley13
3年前
00:Java简单了解
浅谈Java之概述Java是SUN(StanfordUniversityNetwork),斯坦福大学网络公司)1995年推出的一门高级编程语言。Java是一种面向Internet的编程语言。随着Java技术在web方面的不断成熟,已经成为Web应用程序的首选开发语言。Java是简单易学,完全面向对象,安全可靠,与平台无关的编程语言。
Stella981 Stella981
3年前
Django中Admin中的一些参数配置
设置在列表中显示的字段,id为django模型默认的主键list_display('id','name','sex','profession','email','qq','phone','status','create_time')设置在列表可编辑字段list_editable
Wesley13 Wesley13
3年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
Python进阶者 Python进阶者
10个月前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这