-
January 31st, 2007, 01:33 PM
#1
Preg_replace all but something.
Okay, I am parsing some HTML and it has a bunch of <div class=g>...</div>. I am not all there on preg_replace so I can't do this on my own. Anyone know how to eliminate all text on the outside of these <DIV> tags? There are multiples of these <div class=g>...</div> per page.
If the post was helpful...Rate it! Remember to use [code] or [php] tags.
-
February 4th, 2007, 10:27 AM
#2
Re: Preg_replace all but something.
if its always going to be <div class="g">....</div> the way i would do it (because regex's are ebil) is write a function like this:
PHP Code:
function cleanDiv(&$html) { //pass the arg as refrence so you dont have to return anything $lookFor = "<div class=\"g\">"; while (strpos($html,$lookFor) > 0 || strpos($html,$lookFor) === 0 ) { $pos = strpos($html,$lookFor); $endPos = strpos($html,"</div>",$pos); $html = substr_replace($html," ",$pos,strlen($lookFor)-2); // the minus 2 may not be neccesary, im not sure if the escape slashes count in strlen $html = substr_replace($html," ",$endPos,6); } }
Note: I didnt actually run this code, so it may not work, but it would be my first attempt before ultimately using regex's. It should work, however it assumes that there are an equal number of div closing as openings (proper html) so if there isnt, you may error out.
Notice the "===" on the strpos function, this is a method that must be used whenever testing for 0, usually i only need it on the strpos function though. Because of php's weak type casting, "0" can be either boolean or int and used interchangeably. The strpos function will return a boolean 0 if the string is not found but an int 0 if the string is found but just happens to be the first character. The "===" matches type as well as value so when you say "=== 0" that means "equals 0 but is also of type int" 0 of type boolean (false) will not erroneously evaluate as true in this scenario. Its rare that you have to use identical operator, but its a godsend when you need it. Sometimes i wish there was a <==, >==, but oh well, im guess im just getting comfortable with the C and Java typing.
Let me know if that worked, if not i'll write you a regex that should help.
-Dave
-
February 4th, 2007, 11:55 PM
#3
Re: Preg_replace all but something.
Thanks. The problem is that there are many different <div class=g> matches within the page. I used explode() to parse it. All is well. Thanks for the suggestion though.
If the post was helpful...Rate it! Remember to use [code] or [php] tags.
-
February 6th, 2007, 02:17 AM
#4
Re: Preg_replace all but something.
if you have PHP5 at your hands then have a look at the DOM functions/classes.
With them you could build a DOM Tree and extract whatever you want using XPath expressions.. way more elegant than exploding the while bunch
Code:
$oDomDocument = new DOMDocument();
$oDomDocument->preserveWhiteSpace = false;
$bLoaded = $oDomDocument->loadHTML( $sMyHtml );
// select all div tags with class="g"
$oDomXPath = new DOMXPath($oDomDocument);
$oResult = $oDomXPath->query('//div[@class="g"]');
// iterate through selected notes
for( $i=0; $i < $oResult->length; $i++ ) {
// for example get value of attribute $oResult->item($i)->getAttribute('name')]
}
there are 10 kinds of people. those who understand binary and those who don't...
rate a post if you find it usefull, thx
check out my Firefox/Mozilla Extension: http://urlparams.blogwart.com/
-
February 6th, 2007, 08:32 AM
#5
Re: Preg_replace all but something.
Originally Posted by bigBA
if you have PHP5 at your hands...
Unfortunately, we still have 4.1 because of working with Novell+Tomcat custom install. Thanks though.
If the post was helpful...Rate it! Remember to use [code] or [php] tags.
-
February 8th, 2007, 01:57 AM
#6
Re: Preg_replace all but something.
have you tried the preg_match_all? here is my sample code, you may make some refinements on the patternn..
Code:
preg_match_all("|<div[^>]* class=\"g\"[^>]*>.*?</div>|", "<div class=\"g\" jj=\"w\">asasf</div><div class=\"g\">eet</div>", $matches, PREG_SET_ORDER);
echo "<textarea>\n";
foreach ($matches as $value) {
echo "$value[0]\n";
}
echo "</textarea>\n";
Busy
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|