Re: MySQL Collation issue
They are different characters, hence they will return different. I would suggest a string replacement. When inserting into the database, convert each letter using PHP's htmlentities() function.
If you must, can you also replace the characters before searching.
PHP Code:
function noAccentString($temp) {
$replace = array('á' => 'a', 'é' => 'e', 'Ã*' => 'i', 'ó' => 'o', 'ú' => 'u', 'ü' => 'u');
foreach ($replace as $char) {
$temp = str_replace(key($replace), $char, $temp);
next($replace);
}
return $temp;
}
Re: MySQL Collation issue
Thank you.
I can not do that. I have to keep the words in database with its special characters.
The problem is in spanish collation 'á' is equivalente to 'a' but MySQL is ignoring it. Why?
Re: MySQL Collation issue
á is not the same as a. It is a completely different character. That is why it will not match. But, you can use the function I already gave you in my previous post.
PHP Code:
$query = "SELECT * FROM table WHERE column = '" . noAccentString('Mas tarde iré a cenar con mi esposa!') . "'";
EDIT: I forget to add the ñ to the array. You can do that yourself.
Re: MySQL Collation issue
do you mean that collation is only useful for sorting and not works for comparisons?
If that's true the only solution I see is to duplicate the entries in DB since I must have the correct (gramatically) text.
Re: MySQL Collation issue
Running MySQL 5.1.11 pajaro and pájaro return "pájaro". This is with both latin1_spanish_ci and latin1_danish_ci collation. As a result I can't have both pajaro and pájaro in my database, since it would duplice a row.
Are you running an older version which might be the issue?
1 Attachment(s)
Re: MySQL Collation issue
I have 5.0.54
I attacha an image of the simplest query I've done: SELECT 'pajaro'='pájaro'
Doing the query using phpMyAdmin 2.9.0.2 returns TRUE. (collation working)
But when doing with php returns FALSE. (collation not working)
Where's the problem? where's the collation lost?
Re: MySQL Collation issue
Okay. I am clear now.
So I did some testing. My first thoughts were the file type and I think I am correct. Even saving the file in UTF-8 does not truly save the accent marks. For example, try the following. See if the á shows correctly in the textbox even before submitting. When the Spanish characters are not statically typed into a file, they return correctly from the query.
PHP Code:
<?php
if (isset($_POST['query'])) {
$connection = mysql_connect('localhost', 'root', 'root');
$db = mysql_select_db('test');
$query = mysql_query($_POST['query']);
$row = mysql_fetch_object($query);
echo 'Query #1: ' . $row->nombre . '<br /><br />';
$query = mysql_query("SELECT * FROM aves WHERE nombre = 'pájaro'");
$row = mysql_fetch_object($query);
echo 'Query #2: ' . $row->nombre . '<br /><br />';
mysql_close($connection);
}
?>
<html>
<body>
<form action="" method="post">
<input type="text" name="query" value="SELECT * FROM aves WHERE nombre = 'pájaro'" size="80" />
</form>
</body>
</html>
Re: MySQL Collation issue
Tried you suggestion.
The á character is seen on the form.
The query for 'pájaro' is successful and gives the same result with both $_POST and hardcoded text.
The query for 'pajaro' doesn't gives any result for both methods.
The server is running Apache and I told the owners to add an AddDefaultCharset UTF-8 to the .htaccess file. This way the server is sending the information as UTF-8. The file is also saved as UTF-8 in the server side. I don't think file type is a problem.
I see that phpMyAdmin shows "MySQL connection collation utf8_general_ci" I'm thinking that connection collation might be the problem. Has connection collation higher priority than database and field collations?. I thought field collation had the highest priority. I don't understand...
1 Attachment(s)
Re: MySQL Collation issue
The problem is the file type, which is what I began to expect last night. I opened up a generic text editor and saved it with Western (Windows Latin 1) encoding. Both queries returned perfectly.
Re: MySQL Collation issue
For me too, (saving them as utf-8 in my case)
The problem is doing the same with 'pajaro' instead of 'pájaro' I don't obtain any result.
Re: MySQL Collation issue
I'm confused. Are you still having the problem? I'm not. If I search for pájaro OR pajaro...they both return pájaro.
Connection: utf8_unicode_ci
Database: utf8_spanish_ci
Table: utf8_spanish_ci
Column: utf8_spanish_ci
Re: MySQL Collation issue
Yes, still having problems.
The problem must be on the data itself.
Re: MySQL Collation issue
Quote:
Originally Posted by Doctor Luz
Yes, still having problems.
The problem must be on the data itself.
First of all, I'd like to appologize for missing a key (I broke my keyboard half an hour ago, and won't be able to fix it for another week (I'm sure you can see what key is missing)).
You can use regular expressions, I guess.
Code:
SELECT *
FROM aves
WHERE LIKE 'p[áa]j[áa]ro;
Just replace all underscores with the character missing.
Re: MySQL Collation issue
Thanks PeejAvery.
I feel sort of stupid now, though. I just realized, I could simply copy a r from somewhere and use find and replace in notepad, to replace the underscores myself.
Re: MySQL Collation issue
Regular expresions don't work.
I've checked everything, collations, caracter sets...
I've seen pájaro is stored in the DB as: 70 C3 A1 6A 61 72 6F which is 7 bytes long. This is the utf8 code. These 7 bytes, viewed as latin1 appear as pájaro
However actually pájaro in latin1_spanish is 6 bytes long 70 E1 6A 61 72 6F
It looks like utf-8 data in DB is being taken byte by byte as latin1, ignoring character sets and collations. When I search for pájaro this word is translated to pájaro and then I find records because there is a match. However when I search for pajaro there is not any match.
I don't understand.
I will try to change character sets and collations to latin1 to see if I have the same problems.
Re: MySQL Collation issue
Quote:
Originally Posted by Doctor Luz
Regular expresions don't work.
I've checked everything, collations, caracter sets...
I've seen pájaro is stored in the DB as: 70 C3 A1 6A 61 72 6F which is 7 bytes long. This is the utf8 code. These 7 bytes, viewed as latin1 appear as pájaro
However actually pájaro in latin1_spanish is 6 bytes long 70 E1 6A 61 72 6F
It looks like utf-8 data in DB is being taken byte by byte as latin1, ignoring character sets and collations. When I search for pájaro this word is translated to pájaro and then I find records because there is a match. However when I search for pajaro there is not any match.
I don't understand.
I will try to change character sets and collations to latin1 to see if I have the same problems.
Why aren't the regular expressions working? And the thing is, if you use regular expressions, you have to do it from the php code so that á gets set as Ãi instead, making it [aÃi].
If you insert the pájaro from php code and it becomes pÃijaro, and then try to search for
"SELECT * FROM aves WHERE nombre RLIKE 'p[áa]j[áa]ro';", it should turn it into "SELECT * FROM aves WHERE nombre RLIKE 'p[áÃi]j[áÃi]ro';"
Alternatively, I guess you could do it yourself with either
Code:
SELECT * FROM aves WHERE nombre RLIKE 'p[áÃi]j[áÃi]ro';
or
Code:
SELECT * FROM aves WHERE nombre RLIKE 'p([á]|[Ã][i])j([á]|[Ã][i])ro';
Re: MySQL Collation issue
In my most recent attachment, you will notice that I was playing with the ASCII values because I suspected that. I also saw the Ãi a while back.
When saved as UTF, the ASCII character is 195. However, when using Latin 1 the ASCII character is 225. Why can't you recurse your database changing the ASCII value to match your character encoding?
Re: MySQL Collation issue
Queries for matching exact values are working now.
Now I find another problem, REGEXP is not working as I expected.
I have something like this:
$sql="select * from aves where nombre REGEXP '[[:<:]]pajaro[[:>:]]'";
Is there a way for making this query to find me the row "el pájaro loco"?
I only can do it this way [[:<:]]pájaro[[:>:]]
REGEXP does not use collation?
Re: MySQL Collation issue
Why don't you use % as the wildcard? Just like *.doc returns all documents, in SQL %.doc would return all documents.
¿O entendÃ* mal?
Re: MySQL Collation issue
with LIKE?
Yes but it is giving me results like "espantapájaros" I would like to obtain only full words...
I would like to work with REGEXP brecuse the work is done, although maybe processing the results to exclude things like "espantapájaros"...