Hi, I'm using simplexml_load_file() on a RSS feed to then get the titles. The problem is that when for example a ' appears on the title, strange characters appear instead of '. How do I fix this :S? Thanks.
Printable View
Hi, I'm using simplexml_load_file() on a RSS feed to then get the titles. The problem is that when for example a ' appears on the title, strange characters appear instead of '. How do I fix this :S? Thanks.
That means the file you're loading is not UTF-8. Since SimpleXML was created to read/write UTF-8, the file your reading needs to be of that character encoding.
An alternative...which will be slower processing, is to read the file using file_get_contents(), convert to UTF-8, and then use simplexml_load_string().
I tried that, but there are still strange characters instead of '. The code is:
I also tried utf8_encode() instead of mb_convert_encoding and the same happened.Code:$temp = mb_convert_encoding( file_get_contents("feed_url"), 'UTF-8' );
$feed = simplexml_load_string($temp);
When I get the title I send it by e-mail and I see it by e-mail, could the problem be that?
What is the character encoding of the e-mail?
Save the file and upload it here.
Ok, here it is:
Code:<html>
<body>
<?php
$max_news = 5;
$i = 0;
$str = file_get_contents("http://feeds.feedburner.com/PokerNewsDaily?format=xml");
$temp = mb_convert_encoding( $str, "UTF-8" );
$feed = simplexml_load_string($temp);
foreach ($feed -> channel -> item as $item )
{
if ($i > $max_news)
break;
sleep(10);
$subject = $item -> title;
$link = $item -> link;
$description = $item -> description;
$body = "<html><body>" . $description . "<br /><br /><i>PokerDailyNews</i>: <a href=" . $link . ">Read the full report</a></body></html>";
$to = "[email protected]";
$headers = "Content-type: text/html; charset=UTF-8\r\n";
if ( mail($to, $subject, $body, $headers) )
{
echo "<p>" . $subject . " - sent</p>";
}
else
{
echo "<p>" . $subject . " - NOT sent</p>";
}
$i++;
}
?>
</body>
</html>
I'm not seeing any invalid characters.
I'd doubt it.
Is your PHP document saved in UTF-8? Are you also outputting a UTF-8 header?
It's good now :D The problem was that the internal encoding wasn't UTF-8, I wrote this in the beggining of the file:
And I didn't have to convert the string, because the feeds are in UTF-8.Code:header('Content-Type:text/html; charset=UTF-8');
Thanks a lot for your help :)