CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 7 of 7
  1. #1
    Join Date
    May 2002
    Posts
    1,798

    To Unicode or not to Unicode? That is really the question.

    Since Visual Studio 2005, the default configuration for C++ is to use the Unicode character set. The purpose was to allow a fuller character set for language internationalization. If there is some other purpose, I cannot think of it. Consequently, there have proliferated an entire host of macros and functions that usually have the 'w' prefix. Once one descends into the murky depths of 'wide character strings', one will find themselves hopelessly entrapped in a kelp bed of inconsistencies and arcane semi-solutions, longing to return to the Multi-byte character sets of old.

    But alas, it is probably too late. The push is on to use Unicode. Soon legacy code containing only Multi-byte character set configuration will not compile or run.

    With that off my chest, I have found a couple of websites that I have found somewhat helpful.

    http://www-ccs.ucsd.edu/c/wchar.html#wcsncpy
    http://members.gamedev.net/sicrane/a...ndStreams.html

    But some burning questions remain.

    1) What is the difference between wchar_t * and wstring ? These types can be assigned to one another, but functions like wcscpy_s will not let you copy from 1 to the other.

    2) Even though the compiler is set to use the Unicode character set, you still can use the multi-byte character set, but not interchangeably. So the two can coexist, which can make things really confusing.

    3) If one is not going to be writing code for languages other than English, is there any other good reason to use Unicode ?

    I would be interested in your guruish thoughts on these matters.

    Mike
    mpliam

  2. #2
    Join Date
    Jun 2002
    Location
    Letchworth, UK
    Posts
    1,019

    Re: To Unicode or not to Unicode? That is really the question.

    1) wchar_t* vs std::wstring is equivalent to char* vs std::string. wchar_t* is a pointer to a type, wstring is a template. That is why you cannot copy from one to the other using wcscpy.

    2) You can use SBCS, MBCS and DBCS with Unicode. Sometimes you need to output stuff in SBCS.

    3) No unless you wish to display other characters like blobs.
    Succinct is verbose for terse

  3. #3
    Join Date
    Apr 1999
    Location
    Altrincham, England
    Posts
    4,470

    Re: To Unicode or not to Unicode? That is really the question.

    wstring has a constructor and assignment operator that accept wchar_t* arguments, which is why you can assign a wchar_t* to a wstring. However, there is no implicit conversion from wstring to wchar_t*, so if your function takes a wchar_t*, you need to use the c_str() function of wstring to pass it as an argument.
    Correct is better than fast. Simple is better than complex. Clear is better than cute. Safe is better than insecure.
    --
    Sutter and Alexandrescu, C++ Coding Standards

    Programs must be written for people to read, and only incidentally for machines to execute.

    --
    Harold Abelson and Gerald Jay Sussman

    The cheapest, fastest and most reliable components of a computer system are those that aren't there.
    -- Gordon Bell


  4. #4
    Join Date
    May 2002
    Posts
    1,798

    Re: To Unicode or not to Unicode? That is really the question.

    Thank you all for your remarks.

    I think I'll stick with multi-byte applications for now. It's alot easier.

    Mike
    mpliam

  5. #5
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: To Unicode or not to Unicode? That is really the question.

    That's more or less my view. Unicode is pretty much just a pain.....

  6. #6
    Join Date
    Jun 2002
    Location
    Letchworth, UK
    Posts
    1,019

    Re: To Unicode or not to Unicode? That is really the question.

    I disagree - I think Unicode is a lot simpler than MBCS. Not as simple as SBCS but definitely simpler than MBCS. If by MBCS, you meant SBCS then ignore what I've said.

    For instance, how do you know how many printable characters are in an MBCS string? With a unicode string, all characters are the same size so it is just the number of unicode chars.

    With MBCS most of the strings have to be unsigned chars, which means you're down to memcpys of the actual number of unsigned chars: not the number of printable chars. It is very buggy as you have to keep track of both printable chars and actual number of chars in the string. For a relatively big project, this can take almost 3-6 months to get right if you go from SBCS to MBCS. Many people try to get around the warnings by casting - this leads to even more problems as casting hides what would otherwise be compiler checks.

    If you don't have to align stuff, MBCS is OK. If you're relying on string lengths to make your display look pretty, MBCS is an absolute pain.
    Succinct is verbose for terse

  7. #7
    Join Date
    Jun 2005
    Posts
    1,255

    Re: To Unicode or not to Unicode? That is really the question.

    My vote to this virtual poll is : No to unicode.

    Unicode is useless for languages using the Roman alphabet, even when they have a few extra characters, e.g. the French language has some special character: éèùç€..., but these characters are already in the extended Ascii table, and Unicode is not required (I know I live in France).

    It is not very useful for Japanese and Chinese because other encoding methods are much more widely used, e.g. shift-JIS for Japanese.

    Unicode is not well suited to Arabic, because Arabic have lots of ligatures, and they are not easy to code with Unicode.

    I know only one font which is having more than 20 percent of the characters of the Unicode set. So in the real world, an international application is using several different fonts, whether it is programmed with Unicode or not.

    See other discusions
    http://www.codeguru.com/forum/showthread.php?t=442517
    http://www.codeguru.com/forum/showthread.php?t=424531

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured