Click to See Complete Forum and Search --> : PHP How to change encoding with simplexml_load_file()?


Toshioo
April 20th, 2011, 03:34 PM
Hi, I'm using simplexml_load_file() on a RSS feed to then get the titles. The problem is that when for example a ' appears on the title, strange characters appear instead of '. How do I fix this :S? Thanks.

PeejAvery
April 20th, 2011, 10:43 PM
That means the file you're loading is not UTF-8. Since SimpleXML was created to read/write UTF-8, the file your reading needs to be of that character encoding.

An alternative...which will be slower processing, is to read the file using file_get_contents() (http://ar.php.net/manual/en/function.file-get-contents.php), convert to UTF-8, and then use simplexml_load_string() (http://www.php.net/manual/en/function.simplexml-load-string.php).

Toshioo
April 21st, 2011, 04:43 AM
That means the file you're loading is not UTF-8. Since SimpleXML was created to read/write UTF-8, the file your reading needs to be of that character encoding.

An alternative...which will be slower processing, is to read the file using file_get_contents() (http://ar.php.net/manual/en/function.file-get-contents.php), convert to UTF-8, and then use simplexml_load_string() (http://www.php.net/manual/en/function.simplexml-load-string.php).

I tried that, but there are still strange characters instead of '. The code is:

$temp = mb_convert_encoding( file_get_contents("feed_url"), 'UTF-8' );
$feed = simplexml_load_string($temp);

I also tried utf8_encode() instead of mb_convert_encoding and the same happened.

When I get the title I send it by e-mail and I see it by e-mail, could the problem be that?

PeejAvery
April 21st, 2011, 07:57 AM
What is the character encoding of the e-mail?

Toshioo
April 21st, 2011, 09:02 AM
What is the character encoding of the e-mail?

I wrote the email's header like this:

$headers = "Content-type: text/html; charset=UTF-8\r\n";

It shouldn't be the problem though, I just tried using echo and the strange characters appear.

PeejAvery
April 21st, 2011, 10:17 AM
Save the file and upload it here.

Toshioo
April 21st, 2011, 01:50 PM
Ok, here it is:

<html>
<body>

<?php

$max_news = 5;
$i = 0;

$str = file_get_contents("http://feeds.feedburner.com/PokerNewsDaily?format=xml");
$temp = mb_convert_encoding( $str, "UTF-8" );
$feed = simplexml_load_string($temp);

foreach ($feed -> channel -> item as $item )
{
if ($i > $max_news)
break;

sleep(10);
$subject = $item -> title;
$link = $item -> link;
$description = $item -> description;
$body = "<html><body>" . $description . "<br /><br /><i>PokerDailyNews</i>: <a href=" . $link . ">Read the full report</a></body></html>";
$to = "email@email.com";
$headers = "Content-type: text/html; charset=UTF-8\r\n";

if ( mail($to, $subject, $body, $headers) )
{
echo "<p>" . $subject . " - sent</p>";
}
else
{
echo "<p>" . $subject . " - NOT sent</p>";
}

$i++;
}

?>

</body>
</html>

PeejAvery
April 21st, 2011, 08:06 PM
I'm not seeing any invalid characters.

Toshioo
April 22nd, 2011, 04:45 AM
I'm not seeing any invalid characters.

Hmm, then could it be because of the server I use?

PeejAvery
April 23rd, 2011, 08:06 AM
I'd doubt it.

Is your PHP document saved in UTF-8? Are you also outputting a UTF-8 header?

Toshioo
April 23rd, 2011, 03:05 PM
I'd doubt it.

Is your PHP document saved in UTF-8? Are you also outputting a UTF-8 header?

It's good now :D The problem was that the internal encoding wasn't UTF-8, I wrote this in the beggining of the file:

header('Content-Type:text/html; charset=UTF-8');

And I didn't have to convert the string, because the feeds are in UTF-8.

Thanks a lot for your help :)