Visual C++ General: How to use different character sets?
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 2 of 2

Thread: Visual C++ General: How to use different character sets?

Threaded View

  1. #1
    Join Date
    Oct 2002
    Location
    Timisoara, Romania
    Posts
    14,360

    Visual C++ General: How to use different character sets?

    Q: I have this simple function call:
    Code:
    MessageBox(NULL, "Test message", "Title", MB_OK);
    The compiler raises the following error and I don't understand why.
    error C2664: 'MessageBoxW' : cannot convert parameter 2 from 'const char [13]' to 'LPCWSTR'
    Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast
    A: Simply answered, that happens because the project is built for UNICODE.

    Microsoft run-time library provides Microsoft-specific generic-text mappings for many data types, routines and other objects, mappings that are defined in TCHAR.h. There are three supported character sets:
    • ASCII (single-byte character set SBCS)
    • MBCS (multi-byte character set)
    • Unicode


    The use of one or another character set is controlled by two pre-processor directives:
    • _UNICODE: if defined, Unicode is the character set used
    • _MBCS: if defined, MBCS is used
    • If neither of the above (mutually-exclusive) is defined, ASCII is the character set used


    The Windows API provides different version of each function for Unicode and ASCII.



    Q: How do I select the character set?

    A: You have to go to Project Properties > Configuration Properties > General and change the value of the Character Set property. The three available options are:
    • Not Set (neither _UNICODE nor _MBCS are defined)
    • Use Multi-byte Character Set (_MBCS is defined)
    • Use Unicode Character Set (_UNICODE is defined)



    Q: How exactly do the generic-text mapping directives affect the data types and functions that I'm using?

    A: C run-time library functions, such as _itot, or Windows API functions, such are MessageBox, aren't functions at all; they are macros.

    The C run-time library provides functions for all character sets and a macro to define one or another of these functions depending on the used character set. For instance macro _itot resolves to:
    • _itoa, when _UNICODE is not defined
    • _itow, when _UNICODE is defined


    Similarly, TCHAR resolves:
    • char, when _UNICODE is not defined
    • wchar_t, when _UNICODE is defined


    You can read more about the mappings in MSDN.


    On the other hand, the Windows API comes in two versions: for Unicode and for ASCII/Multi-byte. If you read the MSDN page for MessageBox it says:
    The MessageBox function creates, displays, and operates a message box. The message box contains an application-defined message and title, plus any combination of predefined icons and push buttons.

    Code:
    int MessageBox(      
    
        HWND hWnd,
        LPCTSTR lpText,
        LPCTSTR lpCaption,
        UINT uType);
    Actually, MessageBox and LPCTSTR are both macros. You can see how MessageBox it's defined in WinUser.h:

    Code:
    WINUSERAPI
    int
    WINAPI
    MessageBoxA(
        __in_opt HWND hWnd,
        __in_opt LPCSTR lpText,
        __in_opt LPCSTR lpCaption,
        __in UINT uType);
    WINUSERAPI
    int
    WINAPI
    MessageBoxW(
        __in_opt HWND hWnd,
        __in_opt LPCWSTR lpText,
        __in_opt LPCWSTR lpCaption,
        __in UINT uType);
    #ifdef UNICODE
    #define MessageBox  MessageBoxW
    #else
    #define MessageBox  MessageBoxA
    #endif // !UNICODE
    There are two version of the function, actually: MessageBoxA for ASCII & MBCS and MessageBoxW for Unicode. When UNICODE (which is the same with _UNICODE) is defined then MessageBox resolves to MessageBoxW and LPCTSTR to LPCWSTR (i.e. const whar_t*); otherwise MessageBox resolves to MessageBoxA and LPCTSTR to LPCSTR (i.e. const char*).



    Q: How do I write my program so that it builds for any of these character sets without modifying the code when the character set changes?

    A: In a single-byte or multi-byte character set the strings and characters are not prefixed my anything ('string', 'c'). However, for Unicode strings and characters required the suffix L, such as L"string" and L'c'. You can use the Microsoft-specific macros _T() or _TEXT(). These macros are removed by the pre-processor when _UNICODE is not defined, and replaced with L when _UNICODE is defined.

    Unicode defined:
    • no: _T("string") becomes "string" and _T('c') becomes 'c'
    • yes: _T("string") becomes L"string" and _T('c') becomes L'c'




    Q: How do I fix the mention line of code?

    A: It should be clear now:
    Code:
    MessageBox(NULL, _T("Test message"), _T("Title"), MB_OK);

    Last edited by cilu; July 30th, 2007 at 05:12 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center