CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 2 of 2 FirstFirst 12
Results 16 to 21 of 21
  1. #16
    Join Date
    Nov 2001
    Location
    Beyond Juslibol
    Posts
    1,688

    Re: MySQL Collation issue

    Regular expresions don't work.

    I've checked everything, collations, caracter sets...

    I've seen pájaro is stored in the DB as: 70 C3 A1 6A 61 72 6F which is 7 bytes long. This is the utf8 code. These 7 bytes, viewed as latin1 appear as pájaro
    However actually pájaro in latin1_spanish is 6 bytes long 70 E1 6A 61 72 6F

    It looks like utf-8 data in DB is being taken byte by byte as latin1, ignoring character sets and collations. When I search for pájaro this word is translated to pájaro and then I find records because there is a match. However when I search for pajaro there is not any match.

    I don't understand.

    I will try to change character sets and collations to latin1 to see if I have the same problems.
    Last edited by Doctor Luz; March 14th, 2008 at 04:26 PM.

  2. #17
    Join Date
    Dec 2006
    Posts
    203

    Re: MySQL Collation issue

    Quote Originally Posted by Doctor Luz
    Regular expresions don't work.

    I've checked everything, collations, caracter sets...

    I've seen pájaro is stored in the DB as: 70 C3 A1 6A 61 72 6F which is 7 bytes long. This is the utf8 code. These 7 bytes, viewed as latin1 appear as pájaro
    However actually pájaro in latin1_spanish is 6 bytes long 70 E1 6A 61 72 6F

    It looks like utf-8 data in DB is being taken byte by byte as latin1, ignoring character sets and collations. When I search for pájaro this word is translated to pájaro and then I find records because there is a match. However when I search for pajaro there is not any match.

    I don't understand.

    I will try to change character sets and collations to latin1 to see if I have the same problems.
    Why aren't the regular expressions working? And the thing is, if you use regular expressions, you have to do it from the php code so that á gets set as Ãi instead, making it [aÃi].
    If you insert the pájaro from php code and it becomes pÃijaro, and then try to search for
    "SELECT * FROM aves WHERE nombre RLIKE 'p[áa]j[áa]ro';", it should turn it into "SELECT * FROM aves WHERE nombre RLIKE 'p[áÃi]j[áÃi]ro';"
    Alternatively, I guess you could do it yourself with either

    Code:
    SELECT * FROM aves WHERE nombre RLIKE 'p[áÃi]j[áÃi]ro';
    or

    Code:
    SELECT * FROM aves WHERE nombre RLIKE 'p([á]|[Ã][i])j([á]|[Ã][i])ro';
    Sincerely,

    Martin Svendsen

  3. #18
    Join Date
    May 2002
    Posts
    10,943

    Re: MySQL Collation issue

    In my most recent attachment, you will notice that I was playing with the ASCII values because I suspected that. I also saw the Ãi a while back.

    When saved as UTF, the ASCII character is 195. However, when using Latin 1 the ASCII character is 225. Why can't you recurse your database changing the ASCII value to match your character encoding?
    If the post was helpful...Rate it! Remember to use [code] or [php] tags.

  4. #19
    Join Date
    Nov 2001
    Location
    Beyond Juslibol
    Posts
    1,688

    Re: MySQL Collation issue

    Queries for matching exact values are working now.

    Now I find another problem, REGEXP is not working as I expected.

    I have something like this:

    $sql="select * from aves where nombre REGEXP '[[:<:]]pajaro[[:>:]]'";

    Is there a way for making this query to find me the row "el pájaro loco"?

    I only can do it this way [[:<:]]pájaro[[:>:]]

    REGEXP does not use collation?

  5. #20
    Join Date
    May 2002
    Posts
    10,943

    Re: MySQL Collation issue

    Why don't you use % as the wildcard? Just like *.doc returns all documents, in SQL %.doc would return all documents.

    ¿O entendÃ* mal?
    If the post was helpful...Rate it! Remember to use [code] or [php] tags.

  6. #21
    Join Date
    Nov 2001
    Location
    Beyond Juslibol
    Posts
    1,688

    Re: MySQL Collation issue

    with LIKE?

    Yes but it is giving me results like "espantapájaros" I would like to obtain only full words...

    I would like to work with REGEXP brecuse the work is done, although maybe processing the results to exclude things like "espantapájaros"...

Page 2 of 2 FirstFirst 12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured