CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 6 of 6
  1. #1
    Join Date
    May 2002
    Posts
    10,943

    Preg_replace all but something.

    Okay, I am parsing some HTML and it has a bunch of <div class=g>...</div>. I am not all there on preg_replace so I can't do this on my own. Anyone know how to eliminate all text on the outside of these <DIV> tags? There are multiples of these <div class=g>...</div> per page.
    If the post was helpful...Rate it! Remember to use [code] or [php] tags.

  2. #2
    Join Date
    Dec 2006
    Location
    Atlanta, GA
    Posts
    41

    Re: Preg_replace all but something.

    if its always going to be <div class="g">....</div> the way i would do it (because regex's are ebil) is write a function like this:

    PHP Code:
    function cleanDiv(&$html) { //pass the arg as refrence so you dont have to return anything
        
    $lookFor "<div class=\"g\">";
        while (
    strpos($html,$lookFor) > || strpos($html,$lookFor) === ) {
            
    $pos strpos($html,$lookFor);
            
    $endPos strpos($html,"</div>",$pos);
            
    $html substr_replace($html," ",$pos,strlen($lookFor)-2); // the minus 2 may not be neccesary, im not sure if the escape slashes count in strlen
            
    $html substr_replace($html," ",$endPos,6);
        }

    Note: I didnt actually run this code, so it may not work, but it would be my first attempt before ultimately using regex's. It should work, however it assumes that there are an equal number of div closing as openings (proper html) so if there isnt, you may error out.

    Notice the "===" on the strpos function, this is a method that must be used whenever testing for 0, usually i only need it on the strpos function though. Because of php's weak type casting, "0" can be either boolean or int and used interchangeably. The strpos function will return a boolean 0 if the string is not found but an int 0 if the string is found but just happens to be the first character. The "===" matches type as well as value so when you say "=== 0" that means "equals 0 but is also of type int" 0 of type boolean (false) will not erroneously evaluate as true in this scenario. Its rare that you have to use identical operator, but its a godsend when you need it. Sometimes i wish there was a <==, >==, but oh well, im guess im just getting comfortable with the C and Java typing.

    Let me know if that worked, if not i'll write you a regex that should help.

    -Dave

  3. #3
    Join Date
    May 2002
    Posts
    10,943

    Re: Preg_replace all but something.

    Thanks. The problem is that there are many different <div class=g> matches within the page. I used explode() to parse it. All is well. Thanks for the suggestion though.
    If the post was helpful...Rate it! Remember to use [code] or [php] tags.

  4. #4
    Join Date
    May 2004
    Location
    Germany
    Posts
    655

    Re: Preg_replace all but something.

    if you have PHP5 at your hands then have a look at the DOM functions/classes.

    With them you could build a DOM Tree and extract whatever you want using XPath expressions.. way more elegant than exploding the while bunch

    Code:
    $oDomDocument = new DOMDocument();
    $oDomDocument->preserveWhiteSpace = false;
    $bLoaded = $oDomDocument->loadHTML(  $sMyHtml  );
    
    // select all div tags with class="g"
    $oDomXPath = new DOMXPath($oDomDocument);
    $oResult   = $oDomXPath->query('//div[@class="g"]');
    
    // iterate through selected notes
    for( $i=0; $i < $oResult->length; $i++ ) {
         // for example get value of attribute $oResult->item($i)->getAttribute('name')] 
    }
    there are 10 kinds of people. those who understand binary and those who don't...

    rate a post if you find it usefull, thx
    check out my Firefox/Mozilla Extension: http://urlparams.blogwart.com/

  5. #5
    Join Date
    May 2002
    Posts
    10,943

    Re: Preg_replace all but something.

    Quote Originally Posted by bigBA
    if you have PHP5 at your hands...
    Unfortunately, we still have 4.1 because of working with Novell+Tomcat custom install. Thanks though.
    If the post was helpful...Rate it! Remember to use [code] or [php] tags.

  6. #6
    Join Date
    Jan 2003
    Location
    7,107 Islands
    Posts
    2,487

    Re: Preg_replace all but something.

    have you tried the preg_match_all? here is my sample code, you may make some refinements on the patternn..

    Code:
    preg_match_all("|<div[^>]* class=\"g\"[^>]*>.*?</div>|", "<div class=\"g\" jj=\"w\">asasf</div><div class=\"g\">eet</div>", $matches, PREG_SET_ORDER);
    
    echo "<textarea>\n";
    foreach ($matches as $value) {
      echo "$value[0]\n";
    }
    echo "</textarea>\n";
    Busy

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured