Skip to content Skip to sidebar Skip to footer

Get Data Only From Html Table Used Preg_match_all In Php

I have a html table like this :

Solution 1:

PHP has a native extension to parse HTML and XML with DOM:

$dom = new DOMDocument;
$dom->loadHTML( $htmlContent );
$rows = array();
foreach( $dom->getElementsByTagName( 'tr' ) as$tr ) {
    $cells = array();
    foreach( $tr->getElementsByTagName( 'td' ) as$td ) {
        $cells[] = $td->nodeValue;
    }
    $rows[] = $cells;
}

Adjust to your liking. Search StackOverflow or have a look at the PHP Manual or go through some of my answers to learn more about it's usage.

Solution 2:

You absolutely do NOT want to parse HTML with Regex.

There are far too many variations, for one, and more importantly, regex isn't very good with the hierarchal nature of HTML. It's best to use an XML parser or better-yet an HTML-specific parser.

Whenever I need to scrape HTML, I tend to use the Simple HTML DOM Parser library, which takes an HTML tree and parses it into a traversable PHP object, which you can query something like JQuery.

<?php
    require'simplehtmldom/simple_html_dom.php';

    $sHtml = <<<EOS
    <tableborder="1" ><tbodystyle="" ><trstyle="" ><tdstyle="color:blue;">
                      data0
                  </td><tdstyle="font-size:15px;">
                     data1
                  </td><tdstyle="font-size:15px;">
                      data2
                  </td><tdstyle="color:blue;">
                      data3
                  </td><tdstyle="color:blue;">
                      data4
                  </td></tr><trstyle="" ><tdstyle="color:blue;">
                      data00
                  </td><tdstyle="font-size:15px;">
                     data11
                  </td><tdstyle="font-size:15px;">
                      data22
                  </td><tdstyle="color:blue;">
                      data33
                  </td><tdstyle="color:blue;">
                      data44
                  </td></tr><trstyle="color:black" ><tdstyle="color:blue;">
                      data000
                  </td><tdstyle="font-size:15px;">
                     data111
                  </td><tdstyle="font-size:15px;">
                      data222
                  </td><tdstyle="color:blue;">
                      data333
                  </td><tdstyle="color:blue;">
                      data444
                  </td></tr></tbody></table>
EOS;

    $oHTML = str_get_html($sHtml);
    $oTRs = $oHTML->find('table tr');
    $aData = array();
    foreach($oTRs as $oTR) {
        $aRow = array();
        $oTDs = $oTR->find('td');

        foreach($oTDs as $oTD) {
            $aRow[] = trim($oTD->plaintext);
        }

        $aData[] = $aRow;
    }

    var_dump($aData);
?>

And the output:

array0 => 
    array0 => string'data0' (length=5)
      1 => string'data1' (length=5)
      2 => string'data2' (length=5)
      3 => string'data3' (length=5)
      4 => string'data4' (length=5)
  1 => 
    array0 => string'data00' (length=6)
      1 => string'data11' (length=6)
      2 => string'data22' (length=6)
      3 => string'data33' (length=6)
      4 => string'data44' (length=6)
  2 => 
    array0 => string'data000' (length=7)
      1 => string'data111' (length=7)
      2 => string'data222' (length=7)
      3 => string'data333' (length=7)
      4 => string'data444' (length=7)

Post a Comment for "Get Data Only From Html Table Used Preg_match_all In Php"

string...