When it comes to web scraping with Python, not many libraries can surpass BeautifulSoup in terms of features and ease of use. It can help you save yourself a few hours, or even days of work using just a few lines of code.
Using BeautifulSoup, HTML documents are parsed into a tree, which contains tags or text.In this article, we will show you how to remove a HTML tag in BeautifulSoup.
BeautifulSoup has a built in method called extract[]
that allows you to remove a tag or string
from the tree. Once you’ve located the element you want to get rid of, let’s say it’s named i_tag
, calling i_tag.extract[]
will remove the element and return it at the same time.
Code language: Python [python]
markup = 'I linked to example.com' soup = BeautifulSoup[markup, 'html.parser'] a_tag = soup.a i_tag = soup.i.extract[] a_tag # I linked to i_tag # example.com print[i_tag.parent] # None
Please do note that the element returned is a bs4.Tag
or bs4.NavigableString
, not a Python string. The original BeautifulSoup
object is now modified. If you try to find
the original i_tag
, it won’t be found.
In case you don’t care about the content of the tag and just want to destroy it completely, use
BeautifulSoup decompose[]
method. Once called, i_tag.decompose[]
will remove i_tag
and its contents from the BeautifulSoup tree completely without returning anything.
Code language: Python [python]
markup = 'I linked to example.com' soup = BeautifulSoup[markup, 'html.parser'] a_tag = soup.a i_tag = soup.i i_tag.decompose[] a_tag # I linked to
If you want to really sure that the tag is decomposed, you can check its .decomposed
property.
Code language: Python [python]
i_tag.decomposed # True a_tag.decomposed # False
BeautifulSoup’s unwrap[]
method replaces a tag with the contents inside that tag, returning the tag that was replaced. If you want to remove a parent HTML tag from the BeautifulSoup tree and keeping its children and
descendants, this is the method you’re looking for.
Code language: Python [python]
markup = 'I linked to example.com' soup = BeautifulSoup[markup, 'html.parser'] a_tag = soup.a a_tag.i.unwrap[] a_tag # I linked to example.com
Conclusion
We hope that you found the right method to remove a tag from HTML that is suitable for your case from the information above. If you’re interested in more BeautifulSoup basic tutorials, check out our guide on how to find an element by class, how to get text from web pages and how to get attributes of elements in BeautifulSoup.
If you have any questions, then please feel free to ask in the comments below.
Click to rate this post!
You have already voted for this article
Sample Output:
Original Markup: Python exercisesw3resource After decomposing: None
Python Code Editor:
Have another way to solve this solution? Contribute your code [and comments] through Disqus.
Previous: Write a Python program to remove the contents of a tag in a given html document.
Next: Write a Python program to remove a tag or string from a given tree of html document and replace it with the given tag or string.
Python: Tips of the Day
Combining Lists Using Zip:
- Takes multiple collections and returns a new collection.
- The new collection contains items where each item contains one element from each input collection.
- It allows us to transverse multiple collections at the same time.
name = 'abcdef' suffix = [1,2,3,4,5,6] zip[name, suffix] --> returns [a,1],[b,2],[c,3],[d,4],[e,5],[f,6]