Visual C++ General: How to use different character sets?
Q: I have this simple function call:
MessageBox(NULL, "Test message", "Title", MB_OK);
The compiler raises the following error and I don't understand why.
error C2664: 'MessageBoxW' : cannot convert parameter 2 from 'const char ' to 'LPCWSTR'
Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast
A: Simply answered, that happens because the project is built for UNICODE.
Microsoft run-time library provides Microsoft-specific generic-text mappings for many data types, routines and other objects, mappings that are defined in TCHAR.h. There are three supported character sets:
ASCII (single-byte character set – SBCS)
MBCS (multi-byte character set)
The use of one or another character set is controlled by two pre-processor directives:
_UNICODE: if defined, Unicode is the character set used
_MBCS: if defined, MBCS is used
If neither of the above (mutually-exclusive) is defined, ASCII is the character set used
The Windows API provides different version of each function for Unicode and ASCII.
Q: How do I select the character set?
A: You have to go to Project Properties > Configuration Properties > General and change the value of the Character Set property. The three available options are:
Not Set (neither _UNICODE nor _MBCS are defined)
Use Multi-byte Character Set (_MBCS is defined)
Use Unicode Character Set (_UNICODE is defined)
Q: How exactly do the generic-text mapping directives affect the data types and functions that I'm using?
A: C run-time library functions, such as _itot, or Windows API functions, such are MessageBox, aren't functions at all; they are macros.
The C run-time library provides functions for all character sets and a macro to define one or another of these functions depending on the used character set. For instance macro _itot resolves to:
There are two version of the function, actually: MessageBoxA for ASCII & MBCS and MessageBoxW for Unicode. When UNICODE (which is the same with _UNICODE) is defined then MessageBox resolves to MessageBoxW and LPCTSTR to LPCWSTR (i.e. const whar_t*); otherwise MessageBox resolves to MessageBoxA and LPCTSTR to LPCSTR (i.e. const char*).
Q: How do I write my program so that it builds for any of these character sets without modifying the code when the character set changes?
A: In a single-byte or multi-byte character set the strings and characters are not prefixed my anything ('string', 'c'). However, for Unicode strings and characters required the suffix L, such as L"string" and L'c'. You can use the Microsoft-specific macros _T() or _TEXT(). These macros are removed by the pre-processor when _UNICODE is not defined, and replaced with L when _UNICODE is defined.
no: _T("string") becomes "string" and _T('c') becomes 'c'
yes: _T("string") becomes L"string" and _T('c') becomes L'c'