Đẹpsoup chèn html

This is a tutorial on XML processing with lxml. cây etree. It briefly overviews the main concepts of the , and some simple enhancements that make your life as a programmer easier

If your code only uses the ElementTree API and does not rely on any functionality that is specific to lxml. etree, you can also use (any part of) the following import chain as a fall-back to the original ElementTree

To aid in writing portable code, this tutorial makes it clear in the examples which part of the presented API is an extension of lxml. etree over the original , as defined by Fredrik Lundh's ElementTree library

An Element is the main container object for the ElementTree API. Most of the XML tree functionality is accessed through this class. Elements are easily created through the Element factory

>>> root = etree.Element("root")

The XML tag name of elements is accessed through the tag property

Elements are organised in an XML tree structure. To create child elements and add them to a parent element, you can use the append() method

>>> root.append( etree.Element("child1") )

However, this is so common that there is a shorter and much more efficient way to do this. nhà máy SubElement. It accepts the same arguments as the Element factory, but additionally requires the parent as first argument

>>> child2 = etree.SubElement(root, "child2")
>>> child3 = etree.SubElement(root, "child3")

To see that this is really XML, you can serialise the tree you have created

>>> print(etree.tostring(root, pretty_print=True))

  
  
  

Các phần tử là danh sách

To make the access to these subelements easy and straight forward, elements mimic the behaviour of normal Python lists as closely as possible

>>> child = root[0]
>>> print(child.tag)
child1

>>> print(len(root))
3

>>> root.index(root[1]) # lxml.etree only!
1

>>> children = list(root)

>>> for child in root:
..     print(child.tag)
child1
child2
child3

>>> root.insert(0, etree.Element("child0"))
>>> start = root[:1]
>>> end   = root[-1:]

>>> print(start[0].tag)
child0
>>> print(end[0].tag)
child3

Trước ElementTree 1. 3 và lxml 2. 0, you could also check the truth value of an Element to see if it has children, i. e. if the list of children is empty

if root:   # this no longer works!
    print("The root element has children")

This is no longer supported as people tend to expect that a "something" evaluates to True and expect Elements to be "something", may they have children or not. So, many users find it surprising that any Element would evaluate to False in an if-statement like the above. Instead, use len(element), which is both more explicit and less error prone

>>> print(etree.iselement(root))  # test if it's some kind of Element
True
>>> if len(root):                 # test if it has children
..     print("The root element has children")
The root element has children

There is another important case where the behaviour of Elements in lxml (in 2. 0 and later) deviates from that of lists and from that of the original ElementTree (prior to version 1. 3 hoặc Trăn 2. 3/7. 2)

>>> for child in root:
..     print(child.tag)
child0
child1
child2
child3
>>> root[0] = root[-1]  # this moves the element in lxml.etree!
>>> for child in root:
..     print(child.tag)
child3
child1
child2

In this example, the last element is moved to a different position, instead of being copied, i. e. it is automatically removed from its previous position when it is put in a different place. In lists, objects can appear in multiple positions at the same time, and the above assignment would just copy the item reference into the first position, so that both contain the exact same item

________số 8_______

Note that in the original ElementTree, a single Element object can sit in any number of places in any number of trees, which allows for the same copy operation as with lists. The obvious drawback is that modifications to such an Element will apply to all places where it appears in a tree, which may or may not be intended

The upside of this difference is that an Element in lxml. etree always has exactly one parent, which can be queried through the getparent() method. This is not supported in the original ElementTree

>>> root is root[0].getparent()  # lxml.etree only!
True

If you want to copy an element to a different position in lxml. etree, consider creating an independent deep copy using the copy module from Python's standard library

>>> root.append( etree.Element("child1") )
0

The siblings (or neighbours) of an element are accessed as next and previous elements

>>> root.append( etree.Element("child1") )
1

Elements carry attributes as a dict

XML elements support attributes. You can create them directly in the Element factory

>>> root.append( etree.Element("child1") )
2

Attributes are just unordered name-value pairs, so a very convenient way of dealing with them is through the dictionary-like interface of Elements

>>> root.append( etree.Element("child1") )
3

For the cases where you want to do item lookup or have other reasons for getting a 'real' dictionary-like object, e. g. for passing it around, you can use the attrib property

>>> root.append( etree.Element("child1") )
4

Note that attrib is a dict-like object backed by the Element itself. This means that any changes to the Element are reflected in attrib and vice versa. It also means that the XML tree stays alive in memory as long as the attrib of one of its Elements is in use. To get an independent snapshot of the attributes that does not depend on the XML tree, copy it into a dict

>>> root.append( etree.Element("child1") )
5

Các phần tử chứa văn bản

Elements can contain text

>>> root.append( etree.Element("child1") )
6

In many XML documents (data-centric documents), this is the only place where text can be found. It is encapsulated by a leaf tag at the very bottom of the tree hierarchy

However, if XML is used for tagged text documents such as (X)HTML, text can also appear between different elements, right in the middle of the tree

>>> root.append( etree.Element("child1") )
7

Ở đây,
tag is surrounded by text. This is often referred to as document-style or mixed-content XML. Elements support this through their tail property. It contains the text that directly follows the element, up to the next element in the XML tree

>>> root.append( etree.Element("child1") )
8

Hai thuộc tính. văn bản và. tail are enough to represent any text content in an XML document. This way, the ElementTree API does not require any in addition to the Element class, that tend to get in the way fairly often (as you might know from classic DOM APIs)

However, there are cases where the tail text also gets in the way. For example, when you serialise an Element from within the tree, you do not always want its tail text in the result (although you would still want the tail text of its children). For this purpose, the tostring() function accepts the keyword argument with_tail

>>> root.append( etree.Element("child1") )
9

If you want to read only the text, i. e. without any intermediate tags, you have to recursively concatenate all text and tail attributes in the correct order. Again, the tostring() function comes to the rescue, this time using the method keyword

>>> child2 = etree.SubElement(root, "child2")
>>> child3 = etree.SubElement(root, "child3")
0

Using XPath to find text

Another way to extract the text content of a tree is , which also allows you to extract the separate text chunks into a list

>>> child2 = etree.SubElement(root, "child2")
>>> child3 = etree.SubElement(root, "child3")
1

If you want to use this more often, you can wrap it in a function

>>> child2 = etree.SubElement(root, "child2")
>>> child3 = etree.SubElement(root, "child3")
2

Note that a string result returned by XPath is a special 'smart' object that knows about its origins. You can ask it where it came from through its getparent() method, just as you would with Elements

>>> child2 = etree.SubElement(root, "child2")
>>> child3 = etree.SubElement(root, "child3")
3

You can also find out if it's normal text content or tail text

>>> child2 = etree.SubElement(root, "child2")
>>> child3 = etree.SubElement(root, "child3")
4

While this works for the results of the text() function, lxml will not tell you the origin of a string value that was constructed by the XPath functions string() or concat()

>>> child2 = etree.SubElement(root, "child2")
>>> child3 = etree.SubElement(root, "child3")
5

lặp cây

For problems like the above, where you want to recursively traverse the tree and do something with its elements, tree iteration is a very convenient solution. Elements provide a tree iterator for this purpose. It yields elements in document order, i. e. in the order their tags would appear if you serialised the tree to XML

>>> child2 = etree.SubElement(root, "child2")
>>> child3 = etree.SubElement(root, "child3")
6

If you know you are only interested in a single tag, you can pass its name to iter() to have it filter for you. Bắt đầu với lxml 3. 0, you can also pass more than one tag to intercept on multiple tags during iteration

>>> child2 = etree.SubElement(root, "child2")
>>> child3 = etree.SubElement(root, "child3")
7

By default, iteration yields all nodes in the tree, including ProcessingInstructions, Comments and Entity instances. If you want to make sure only Element objects are returned, you can pass the Element factory as tag parameter

>>> child2 = etree.SubElement(root, "child2")
>>> child3 = etree.SubElement(root, "child3")
8

Note that passing a wildcard "*" tag name will also yield all Element nodes (and only elements)

trong lxml. etree, elements provide for all directions in the tree. children, parents (or rather ancestors) and siblings

Tuần tự hóa

Serialisation commonly uses the tostring() function that returns a string, or the ElementTree. write() method that writes to a file, a file-like object, or a URL (via FTP PUT or HTTP POST). Both calls accept the same keyword arguments like pretty_print for formatted output or encoding to select a specific output encoding other than plain ASCII

>>> child2 = etree.SubElement(root, "child2")
>>> child3 = etree.SubElement(root, "child3")
9

Note that pretty printing appends a newline at the end

For more fine-grained control over the pretty-printing, you can add whitespace indentation to the tree before serialising it, using the indent() function (added in lxml 4. 5)

>>> print(etree.tostring(root, pretty_print=True))

  
  
  

0

Trong lxml 2. 0 and later (as well as ElementTree 1. 3), the serialisation functions can do more than XML serialisation. You can serialise to HTML or extract the text content by passing the method keyword

>>> print(etree.tostring(root, pretty_print=True))

  
  
  

1

As for XML serialisation, the default encoding for plain text serialisation is ASCII

>>> print(etree.tostring(root, pretty_print=True))

  
  
  

2

Here, serialising to a Python unicode string instead of a byte string might become handy. Just pass the name 'unicode' as encoding