CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 6 of 6
  1. #1
    Join Date
    Aug 2002
    Location
    Brazil
    Posts
    730

    [RESOLVED] Need help on data extract script

    I need help extracting info from a HTML table.

    The table have 5 columns and many rows. I want to extract the table information into an array so I can save in database.

    This is the HTML code i'm dealing with:

    Code:
    	 <tr>
    	  <td bgcolor=#FFFFFF><b>
    		Data1 </b></td>
    	  <td bgcolor=#FFFFFF>Data2</td>
    	  <td bgcolor=#FFFFFF colspan="2">Data3</td>
    	  <td bgcolor=#FFFFFF>Data4</td>
    	  <td bgcolor=#FFFFFF>Data5</td>
    	 </tr>
    	 
    	 <tr>
    		Data1 </b></td>
    	  <td bgcolor=#FFFFFF>Data2</td>
    	  <td bgcolor=#FFFFFF colspan="2">Data3</td>
    	  <td bgcolor=#FFFFFF>Data4</td>
    	  <td bgcolor=#FFFFFF>Data5</td>
    	 </tr>
    PHP should be able to handle this easy with preg_match_all() but i'm unable to make a regular expression for this one. Please help!

    Thanks for attention!
    Last edited by bubu; October 16th, 2008 at 04:38 PM.
    All consequences are eternal in some way.

  2. #2

    Re: Need help on data extract script

    This type of thing gets *really* ugly as you go along as slight changes will often break the code / extract.

    Basically, you'll need to match across each set of <tr>...</tr>'s (using a multi-line match) and then iterate over the internal contents as needed.

    How structured is the page? The version shown below doesn't make it particularly easy to parse.

  3. #3
    Join Date
    Aug 2002
    Location
    Brazil
    Posts
    730

    Cool Re: Need help on data extract script

    Thank you soo much for your reply! I've been trying it all day. All the craziest regular expressions from some books, google, php manual and from my head have been tried with no sucess. =(

    I'm trying to get a list of items from this pages:
    http://www.ittf.com/ittf_equipment/R...Company=ANDRO&

    There's a table with items, each row in the html table would be a row in database table. So i need to get the data in the TDs. But I can't even get the TRs!


    I've beeen trying something like:
    Code:
    $html = file_get_contents('http://www.ittf.com/ittf_equipment/Racket_Coverings1.asp?s_Company=ANDRO');
    
    // code to clean  unused HTML to leave only table TRs:
    $start = strpos($html, '<tr', strpos($html, 'Rubber ID Stamp'));
    $stop = strpos($html, '<td colspan="6" bgcolor="#CCCCCC">', $start);
    if (!$stop)
    {
       $stop = strpos($html, '</table>', $start);
       if (!$stop)
       {
    	  echo 'Error while parsing HTML.';
    	  exit;
       }
       else
       {
    	  $stop = $stop - 10;
       }
    }
    else
    {
       $stop = $stop - 20;
    }
    $html = substr($html, $start, $stop - $start);
    $html = str_replace("\r\n", "", $html);
    
    // finished cleaning HTML. now $html has only important data
    
    // OUR REGULAR EXPRESSION DOESN't WORK (insert desesperate screams here)
    $pattern = '|<tr>(.*)</tr>|i';
    		
    if (!preg_match_all($pattern, $html, $matches))
    {
       echo '<br>Error: No item found.';
       exit;
    }
    		
    print_r($matches);
    =(
    Last edited by bubu; October 16th, 2008 at 07:24 PM.
    All consequences are eternal in some way.

  4. #4
    Join Date
    May 2002
    Posts
    10,943

    Re: Need help on data extract script

    Since you know that it is always 5 across, that makes it very simple. The following script should get you more than started.

    PHP Code:
    $contents str_replace("\n"''str_replace("\r"''$contents));
    preg_match_all('/<td\b[^>]*>(.*?)<\/td>/i'$contents$matches);

    $row 1;
    $column 1;
    foreach (
    $matches[1] as $match) {
      
    $match trim(strip_tags($match));

      echo 
    $row '-' $column ': ' $match '<br />';
      
      
    $column++;
      if (
    $column == 6) {
        
    $column 1;
        
    $row++;
      }

    If the post was helpful...Rate it! Remember to use [code] or [php] tags.

  5. #5
    Join Date
    Aug 2002
    Location
    Brazil
    Posts
    730

    Re: Need help on data extract script

    PeejAvery! You probably saved me hours of head bashing! Thank you!

    I've changed some little things so I got the hole table in one array like I wanted:

    Code:
    		$row = 1;
    		$column = 1;
    		$items = array();
    		foreach ($matches[1] as $match)
    		{
    			$items[$row][$column] = trim($match);
    			$column++;
    			if ($column == 6) {
    				$column = 1;
    				$row++;
    			}
    		}
    		print_r($items);
    Thank you very much PeejAvery! Thank you also mmetzger for your attention.
    All consequences are eternal in some way.

  6. #6
    Join Date
    May 2002
    Posts
    10,943

    Re: Need help on data extract script

    You're most welcome. Glad I could save you so much stress.
    If the post was helpful...Rate it! Remember to use [code] or [php] tags.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured