How parse html in php?

I know we can use PHP DOM to parse HTML using PHP. I found a lot of questions here on Stack Overflow too. But I have a specific requirement. I have an HTML content like below

Chapter 1

This is chapter 1

Chapter 2

This is chapter 2

Chapter 3

This is chapter 3

I want to parse the above HTML and save the content into two different arrays like:

$heading and $content

$heading = array('Chapter 1','Chapter 2','Chapter 3');
$content = array('This is chapter 1','This is chapter 2','This is chapter 3');

I can achieve this simply using jQuery. But I am not sure, if that's the right way. It would be great if someone can point me to the right direction. Thanks in advance.

How parse html in php?

hatef

4,60328 gold badges44 silver badges43 bronze badges

asked Aug 21, 2013 at 4:55

6

I have used domdocument and domxpath to get the solution, you can find it at:


    Chapter 1

This is chapter 1

Chapter 2

This is chapter 2

Chapter 3

This is chapter 3

'; $dom->loadHTML($test); $xpath = new DOMXpath($dom); $heading=parseToArray($xpath,'Heading1-H'); $content=parseToArray($xpath,'Normal-H'); var_dump($heading); echo "
"; var_dump($content); echo "
"; function parseToArray($xpath,$class) { $xpathquery="//span[@class='".$class."']"; $elements = $xpath->query($xpathquery); if (!is_null($elements)) { $resultarray=array(); foreach ($elements as $element) { $nodes = $element->childNodes; foreach ($nodes as $node) { $resultarray[] = $node->nodeValue; } } return $resultarray; } }

Live result: http://saji89.codepad.org/2TyOAibZ

answered Aug 21, 2013 at 5:45

saji89saji89

1,9934 gold badges25 silver badges49 bronze badges

1

Try to look at PHP Simple HTML DOM Parser

It has brilliant syntax similar to jQuery so you can easily select any element you want by ID or class

// include/require the simple html dom parser file

$html_string = '
    

Chapter 1

This is chapter 1

Chapter 2

This is chapter 2

Chapter 3

This is chapter 3

'; $html = str_get_html($html_string); foreach($html->find('span') as $element) { if ($element->class === 'Heading1-H') { $heading[] = $element->innertext; }else if($element->class === 'Normal-H') { $content[] = $element->innertext; } }

How parse html in php?

answered Aug 21, 2013 at 4:58

4

Here's an alternative way to parse the html using DiDOM which offers significantly better performance in terms of speed and memory footprint.

composer require imangazaliev/didom

    Chapter 1

This is chapter 1

Chapter 2

This is chapter 2

Chapter 3

This is chapter 3

HTML; $document = new Document($html); // find chapter headings $elements = $document->find('.Heading1-H'); $headings = []; foreach ($elements as $element) { $headings[] = $element->text(); } // find chapter texts $elements = $document->find('.Normal-H'); $chapters = []; foreach ($elements as $element) { $chapters[] = $element->text(); } echo("Headings\n"); foreach ($headings as $heading) { echo("- {$heading}\n"); } echo("Chapter texts\n"); foreach ($chapters as $chapter) { echo("- {$chapter}\n"); }

answered Dec 25, 2020 at 6:11

How parse html in php?

8ctopus8ctopus

1,9712 gold badges14 silver badges20 bronze badges

1

// Create DOM from URL or file

$html = file_get_html('http://www.google.com/');

// Find all images

foreach($html->find('img') as $element) 
   echo $element->src . '
';

// Find all links

foreach($html->find('a') as $element) 
   echo $element->href . '
';

How parse html in php?

Chen-Tsu Lin

22.2k16 gold badges51 silver badges63 bronze badges

answered Mar 5, 2014 at 7:55

How parse html in php?

jfraberjfraber

5891 gold badge5 silver badges6 bronze badges

2

Can PHP read HTML file?

If you want to run your HTML files as PHP, you can tell the server to run your . html files as PHP files, but it's a much better idea to put your mixed PHP and HTML code into a file with the . php extension.

How do you parse HTML?

HTML parsing involves tokenization and tree construction. HTML tokens include start and end tags, as well as attribute names and values. If the document is well-formed, parsing it is straightforward and faster. The parser parses tokenized input into the document, building up the document tree.

Can we parse HTML?

HTML is a markup language with a simple structure. It would be quite easy to build a parser for HTML with a parser generator. Actually, you may not need even to do that, if you choose a popular parser generator, like ANTLR. That is because there are already available grammars ready to be used.

What is parsing in PHP?

The parse_str() function is a built-in function in PHP which parses a query string into variables. The string passed to this function for parsing is in the format of a query string passed via a URL. Syntax : parse_str($string, $array)