CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 16
  1. #1
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    So how do you compare strings in MFC/Unicode build?

    Hello everyone:


    I just stumbled upon a bug in MFC that I could not believe my eyes still existed! It is no secret that pretty much every computer program heavily relies on string comparison, and quite a few of those string comparisons are case-insensitive. So for those of us who use MFC, we probably rely on CStringT ATL template class, and CString::CompareNoCase() method to do the work of string comparison. So, guess what, it seems to have one nasty bug when comparing any non-English strings in a Unicode-built project.

    I made a small sample project (using a version of MFC, as early as I could find) to demonstrate the issue. This small example will work fine for any English string, but it seems to fail miserably for any foreign characters. (We have a Russian guy at work. He tested it and told me that it didn't work for sure with the Cyrillic alphabet.) I tried building it with the latest version of VS2010 with the same result.

    So my question to you, what APIs do you use to work with strings? And is there at least something (half-way) reliable that is written by Microsoft these days?
    Attached Files Attached Files

  2. #2
    Join Date
    Apr 1999
    Posts
    27,449

    Re: So how do you compare strings in MFC/Unicode build?

    Quote Originally Posted by ahmd View Post
    I made a small sample project (using a version of MFC, as early as I could find) to demonstrate the issue. This small example will work fine for any English string, but it seems to fail miserably for any foreign characters.
    Give us an example string, so that persons who don't speak a non-English language can test this.

    Regards,

    Paul McKenzie

  3. #3
    Join Date
    Aug 2008
    Location
    Scotland
    Posts
    379

    Re: So how do you compare strings in MFC/Unicode build?

    This link has some useful general information about locale-dependent case comparision
    http://lafstern.org/matt/col2_new.pdf

    Key point is that the mapping between upper and lower case depends on which locale is in use.

    However, the MSDN documentation states that CompareNoCase "is not affected by locale", so I would not expect it to be able to compare strings that include non-English characters.

  4. #4
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Re: So how do you compare strings in MFC/Unicode build?

    Quote Originally Posted by Paul McKenzie View Post
    Give us an example string, so that persons who don't speak a non-English language can test this.
    Well, Paul if I had that option it wouldn't be so upsetting to me. But since you asked. From Google Translate:

    French:
    Ítre humain
     tre humain

    Lithuanian:
    ěmogus
    émogus

    Bulgarian:
    човек
    Човек

    None of these seem to work for me. Do they on your end?

    Quote Originally Posted by alanjhd08 View Post
    ... Key point is that the mapping between upper and lower case depends on which locale is in use...
    Am I missing something in the definition of "Unicode" -- there should be no locale? That's the reason why we use it, isn't it?

  5. #5
    Join Date
    Nov 2003
    Posts
    1,902

    Re: So how do you compare strings in MFC/Unicode build?

    If you only care about equality, I believe you could just call "CompareStringW(LOCALE_INVARIANT, NORM_IGNORECASE".

    Or you could fancy with this: https://blogs.msdn.com/b/greggm/arch...21/472453.aspx

    gg

  6. #6
    Arjay's Avatar
    Arjay is offline Moderator / EX MS MVP Power Poster
    Join Date
    Aug 2004
    Posts
    13,492

    Re: So how do you compare strings in MFC/Unicode build?

    It tested fine when I converted the VC 2002 project to VC 2008.

    Using a 9 year old environment couldn't be the cause, could it?

    I used an arabic string for the test.

    काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥

    In general, using old compilers/environments are a big waste of time.

    Even VS 2008 is getting long in the tooth.

  7. #7
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Re: So how do you compare strings in MFC/Unicode build?

    Quote Originally Posted by Codeplug View Post
    If you only care about equality, I believe you could just call "CompareStringW(LOCALE_INVARIANT, NORM_IGNORECASE".
    Guys, you're kidding me. There's no consensus on how to do it? Yes, Codeplug, I believe CompareStringW might work as well as lstrcmpi().
    Quote Originally Posted by Codeplug View Post
    That article seems to have been written for an ANSI-type project. (The date on it is 2005.) As Arjay likes to put it, this is a 6-year old solution.

    Quote Originally Posted by Arjay View Post
    It tested fine when I converted the VC 2002 project to VC 2008.

    Using a 9 year old environment couldn't be the cause, could it?
    If you read my post above, you'd know that I did compile it for VS2002 to ensure that everyone could compile it to see what I'm talking about. I also said that I tried it with the latest version of VS2010 and produced the same result.

    Quote Originally Posted by Arjay View Post
    I used an arabic string for the test.

    काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥
    It didn't work at all. The string wasn't converted to all caps and the reason it matched is because two strings are the same.

    Did you try any examples I posted above? It's a little bit of copy-and-paste magic.

    Quote Originally Posted by Arjay View Post
    In general, using old compilers/environments are a big waste of time.
    In general using badly written environments is a big waste of time.

  8. #8
    Join Date
    Apr 1999
    Posts
    27,449

    Re: So how do you compare strings in MFC/Unicode build?

    Quote Originally Posted by ahmd View Post
    Well, Paul if I had that option it wouldn't be so upsetting to me. But since you asked. From Google Translate:

    French:
    être humain
    Être humain
    The lower level function that is called is _wcsicmp. From MSDN:
    You will need to call setlocale for _wcsicmp to work with Latin 1 characters. The C locale is in effect by default, so, for example, ä will not compare equal to Ä. Call setlocale with any locale other than the C locale before the call to _wcsicmp. The following sample demonstrates how _wcsicmp is sensitive to the locale:
    The strings do not compare without setting the locale to French. Once you do that, then they do compare correctly:
    Code:
    #include <locale.h>
    //...
        setlocale(LC_ALL, "French");
    Regards,

    Paul McKenzie

  9. #9
    Join Date
    Nov 2003
    Posts
    1,902

    Re: So how do you compare strings in MFC/Unicode build?

    >> That article seems to have been written for an ANSI-type project.
    There is nothing ANSI in it.

    >> As Arjay likes to put it, this is a 6-year old solution.
    And it's still the recommended way of comparing filenames and user names for equality. It's how the OS does it today.

    >> I believe CompareStringW might work as well as lstrcmpi().
    lstrcmpiW() calls CompareStringW() using the thread's locale (LCID). That may not be what you want.

    https://blogs.msdn.com/b/michkap/arc...15/481314.aspx
    So if you want to mimic how the OS compares filenames/usernames, then the method in greggm's article is the way to go.

    If you want native speaker, linguistic equality, then you'll need to know the language your dealing with (for the LCID), and use LINGUISTIC_IGNORECASE as the second parameter to CompareStringW().

    gg

  10. #10
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Re: So how do you compare strings in MFC/Unicode build?

    Quote Originally Posted by Paul McKenzie View Post
    The strings do not compare without setting the locale to French. Once you do that, then they do compare correctly:
    But Paul, this is a Unicode build. Why do I need to set locale? And how would I know that someone types in a French word?

  11. #11
    Join Date
    Apr 1999
    Posts
    27,449

    Re: So how do you compare strings in MFC/Unicode build?

    Quote Originally Posted by ahmd View Post
    But Paul, this is a Unicode build. Why do I need to set locale?
    Because the documentation says so. I tested with a Unicode build, duplicated your issue, and fixed it by reading the docs and making the change.
    And how would I know that someone types in a French
    You won't know. Either you code your application to use the locale that's set or set the locale yourself by reading some user configuration setting, or come up with a localized version of your app, etc.

    Or you just say "your app only works for English text".

    Regards,

    Paul McKenzie

  12. #12
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Re: So how do you compare strings in MFC/Unicode build?

    Well, fellas that is news to me. I appreciate your insight though. I guess I haven't dealt with Windows for some time to think that locales were the thing of the past.

    OK, then this article posted by Codeplug is the workaround (thank you for sharing, btw):
    http://blogs.msdn.com/b/greggm/archi...21/472453.aspx

    But still going back to my original question. Say, for people who paid for the version of Visual Studio that comes with the MFC classes, what shall they do with CString? Not everyone can debug into it and see the underlying low-level APIs like you did. Is the answer not use CString at all? If so, it's one of those pervasive classes that are all over MFC.

  13. #13
    Join Date
    Apr 1999
    Posts
    27,449

    Re: So how do you compare strings in MFC/Unicode build?

    Quote Originally Posted by ahmd View Post
    But still going back to my original question. Say, for people who paid for the version of Visual Studio that comes with the MFC classes, what shall they do with CString? Not everyone can debug into it and see the underlying low-level APIs like you did.
    If they have installed the source code, they should be able to debug into it. That's all I did after I duplicated your issue to see what was being called.
    Is the answer not use CString at all? If so, it's one of those pervasive classes that are all over MFC.
    I know, but CString isn't a Windows class or a CRT function -- it is a C++ class wrapper for most string management you will find in a Windows app. You just need to be prepared to debug it if it doesn't do what you want it to do, so you can see what lower-level API's are being called.

    CString can't be all things for all people -- there are a plethora of low-level API string functions, and I would be surprised if CString has wrapped all the different possibilities that can occur.

    Regards,

    Paul McKenzie

  14. #14
    Join Date
    Jun 2011
    Posts
    2

    Re: So how do you compare strings in MFC/Unicode build?

    I've been dealing with my app localization for some time, and this is what I found out:
    first call
    _configthreadlocale(_DISABLE_PER_THREAD_LOCALE);
    _wsetlocale(LC_ALL,_T(""));

    Then instead of CString::Compare() and CString::CompareNoCase(), use CString::Collate() and CString::CollateNoCase(). The CString::MakeUpper() and MakeLower() will also act properly.

    Using a country name/code page for the second argument of _wsetlocale() will change the order of the letters outside of the given language alphabet, which would be ok for an application dealing with data of a single language at a time, but when you deal with multi-language data, use "" instead.

  15. #15
    Join Date
    Jun 2011
    Posts
    2

    Re: So how do you compare strings in MFC/Unicode build?

    Correction:
    Quote Originally Posted by maxxemm View Post
    _wsetlocale(LC_ALL,_T(""));
    A call must be _wsetlocale(LC_ALL,L""), otherwise it will fail for non-unicode build.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured