What is simple html dom parser?

PHP Simple HTML DOM Parser

A fast, simple and reliable HTML document parser for PHP.

Created by S.C. Chen, based on HTML Parser for PHP 4 by Jose Solorzano.

Parse any HTML document

PHP Simple HTML DOM Parser handles any HTML document, even ones that are considered invalid by the HTML specification.

Select elements using CSS selectors

PHP Simple HTML DOM Parser supports CSS style selectors to navigate the DOM, similar to jQuery.

Download

  • Download the latest version from SourceForge

Contributing

  • Request features on the Feature Request Tracker
  • Report bugs on the Bug Tracker
  • Get involved with the community on the Discussions Board

License

PHP Simple HTML DOM Parser is Free Software licensed under the MIT License.


A simple PHP HTML DOM parser written in PHP5+, supports invalid HTML, and provides a very easy way to find, extract and modify the HTML elements of the dom. jquery like syntax allow sophisticated finding methods for locating the elements you care about.

License

MIT License

What is simple html dom parser?

Manage your entire sales cycle, track client communication, and connect your projects, marketing activities and much more with a CRM that’s easy to use.

User Ratings

4.7 out of 5 stars

★★★★★

★★★★

★★★

★★

ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5

features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5

design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5

support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5

User Reviews

  • All
  • ★★★★★
  • ★★★★
  • ★★★
  • ★★

  • Nice project. Very easy to use. Some examples are not updated, I will post to update Slashdot example. Otherwise I would review with 5 stars. Thanks for all the great work! (cat translation) "Bonic projecte. Molt fàcil d'utilitzar. Alguns exemples no s’actualitzen, els publicaré per actualitzar l’exemple de Slashdot. En cas contrari, ho revisaria amb 5 estrelles. Gràcies per tota la gran feina! "

  • Funciona pefectamente con php5.6, php7.0, php7.1, php7.2, php7.3 y php7.4. Works perfectly with php5.6, php7.0, php7.1, php7.2, php7.3 and php7.4. Gracias! Thanks!

  • Yes it is simple! Big tnx!

  • Great code and excellent support!

  • This is wonderful script. Very easy to use, it helps me to make many magics. I just have a problem after upgrade server to PHP7. Some maintenance update is needed. I am getting warnings FATAL ERROR syntax error, unexpected '"', expecting ',' or ';' on line number 163 and Trying to get property of non-object.

Read more reviews >

Additional Project Details

Intended Audience

Developers

Programming Language

PHP
2008-02-19

Index

  • Quick Start
  • How to create HTML DOM object?
  • How to find HTML elements?
  • How to access the HTML element's attributes?
  • How to traverse the DOM tree?
  • How to dump contents of DOM object?
  • How to customize the parsing behavior?
  • API Reference
  • FAQ

Quick Start

Top

  • Get HTML elements
  • Modify HTML elements
  • Extract contents from HTML
  • Scraping Slashdot!


$html = file_get_html('http://www.google.com/');


foreach($html->find('img') as $element)
       echo $element->src . '
'
;


foreach($html->find('a') as $element)
       echo $element->href . '
'
;


$html = str_get_html('

Hello
World
'); $html->find('div', 1)->class = 'bar';

$html->find('div[id=hello]', 0)->innertext = 'foo';

echo $html;


echo file_get_html('http://www.google.com/')->plaintext;


$html = file_get_html('http://slashdot.org/');


foreach($html->find('div.article') as $article) {
    $item['title']     = $article->find('div.title', 0)->plaintext;
    $item['intro']    = $article->find('div.intro', 0)->plaintext;
    $item['details'] = $article->find('div.details', 0)->plaintext;
    $articles[] = $item;
}

print_r($articles);

How to create HTML DOM object?

Top

  • Quick way
  • Object-oriented way


$html = str_get_html('Hello!');


$html = file_get_html('http://www.google.com/');


$html = file_get_html('test.htm');


$html = new simple_html_dom();


$html->load('Hello!');


$html->load_file('http://www.google.com/');


$html->load_file('test.htm');

How to find HTML elements?

Top

  • Basics
  • Advanced
  • Descendant selectors
  • Nested selectors
  • Attribute Filters
  • Text & Comments


$ret = $html->find('a');


$ret = $html->find('a', 0);


$ret = $html->find('a', -1);


$ret = $html->find('div[id]');


$ret = $html->find('div[id=foo]');


$ret = $html->find('#foo');


$ret = $html->find('.foo');


$ret = $html->find('*[id]');


$ret = $html->find('a, img');


$ret = $html->find('a[title], img[title]');

Supports these operators in attribute selectors:

FilterDescription
[attribute] Matches elements that have the specified attribute.
[!attribute] Matches elements that don't have the specified attribute.
[attribute=value] Matches elements that have the specified attribute with a certain value.
[attribute!=value] Matches elements that don't have the specified attribute with a certain value.
[attribute^=value] Matches elements that have the specified attribute and it starts with a certain value.
[attribute$=value] Matches elements that have the specified attribute and it ends with a certain value.
[attribute*=value] Matches elements that have the specified attribute and it contains a certain value.


$es = $html->find('ul li');


$es = $html->find('div div div');


$es = $html->find('table.hello td');


$es = $html->find(''table td[align=center]');


foreach($html->find('ul') as $ul)
{
       foreach($ul->find('li') as $li)
       {
            
       }
}


$e = $html->find('ul', 0)->find('li', 0);

How to access the HTML element's attributes?

Top

  • Get, Set and Remove attributes
  • Magic attributes
  • Tips


$value = $e->href;


$e->href = 'my link';


$e->href = null;


if(isset($e->href))
        echo 'href exist!';


$html = str_get_html("

foo bar
");
$e = $html->find("div", 0);

echo $e->tag;
echo $e->outertext;
echo $e->innertext;
echo $e->plaintext;

Attribute NameUsage
$e->tag Read or write the tag name of element.
$e->outertext Read or write the outer HTML text of element.
$e->innertext Read or write the inner HTML text of element.
$e->plaintext Read or write the plain text of element.


echo $html->plaintext;


$e->outertext = '

' . $e->outertext . '
';


$e->outertext = '';


$e->outertext = $e->outertext . '

foo
';


$e->outertext = '

foo
' . $e->outertext;

How to traverse the DOM tree?

Top

  • Background Knowledge
  • Traverse the DOM tree


echo $html->find("#div1", 0)->children(1)->children(1)->children(2)->id;

echo $html->getElementById("div1")->childNodes(1)->childNodes(1)->childNodes(2)->getAttribute('id');

You can also call methods with Camel naming convertions.

Method Description

mixed

$e->children ( [int $index] )
Returns the Nth child object if index is set, otherwise return an array of children.

element

$e->parent ()
Returns the parent of element.

element

$e->first_child ()
Returns the first child of element, or null if not found.

element

$e->last_child ()
Returns the last child of element, or null if not found.

element

$e->next_sibling ()
Returns the next sibling of element, or null if not found.

element

$e->prev_sibling ()
Returns the previous sibling of element, or null if not found.

How to dump contents of DOM object?

Top

How to customize the parsing behavior?

Top

  • Callback function


function my_callback($element) {
        
        if ($element->tag=='b')
                $element->outertext = '';
}


$html->set_callback('my_callback');


echo $html;

What is HTML DOM parser?

The DOMParser interface provides the ability to parse XML or HTML source code from a string into a DOM Document . You can perform the opposite operation—converting a DOM tree into XML or HTML source—using the XMLSerializer interface.

What is simple HTML DOM?

The HTML DOM is an Object Model for HTML. It defines: HTML elements as objects. Properties for all HTML elements. Methods for all HTML elements.

What is simple HTML DOM parser PHP?

A simple PHP HTML DOM parser written in PHP5+, supports invalid HTML, and provides a very easy way to find, extract and modify the HTML elements of the dom. jquery like syntax allow sophisticated finding methods for locating the elements you care about.

Is HTML a dom?

The Document Object Model (DOM) is a programming API for HTML and XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated.