-
September 25th, 2010, 10:15 PM
#1
UNICODE question and help needed
Alrighty lets just get started...
If I write a library that uses UNICODE can someone that isn't using UNICODE in their software be able to use my library without making changes? Or if my library is ASCII (vice versa of above question)
If someone could shed a little more light on UNICODE and these different character sets more generally also (not just related to the above question) it would be much appreciated.
Thanks in advance, DangerD.
-
September 26th, 2010, 08:15 AM
#2
Re: UNICODE question and help needed
Unlikely.
Each character is assigned a unique integral value, of which the representation is known as 'chracter mapping.' ASCII is usually a single byte(7 or 8 bit) whereas UNICODE is either 2 or 4 bytes. So the question here is not whether you have to change the code, but rather how to convert (and correctly inteprete) these two character sets from and to and still keep all the data intact.
UTF-8 ( a variable length in byte ) is a better choice for multilingual support.
You can google UCS to get into more details.
-
September 26th, 2010, 09:53 AM
#3
Re: UNICODE question and help needed
I would just use wstring. They are 4 byte characters, and easily converted to regular strings.
-
September 26th, 2010, 10:15 AM
#4
Re: UNICODE question and help needed
In general, yes, it is possible for UNICODE and ANSI to co-exist in a project where the application is of one type and the dll of another - it just requires a bit more care when exchanging strings between the application and the dll. When passing strings to the dll they must be converted first to the dll's string type, and the inverse is true for strings returned by the dll.
-
September 28th, 2010, 10:22 AM
#5
Re: UNICODE question and help needed
Are you providing the source code or a compiled version of the library?
If you are providing the source code, you can write:
Code:
#ifdef USING_UNICODE
typedef Character wchar_t
#else
typedef Character char
#endif
typedef std::basic_string<Character> String;
ASCII is the same as UNICODE except that it only goes up to 0x007E.
-
September 28th, 2010, 10:31 AM
#6
Re: UNICODE question and help needed
The only thing you need to be careful of is that you specify the encoding, not just the type, of any string being passed between your library and user code. It isn't enough to say "Unicode"; a Windows user may interpret this to mean UTF-16LE, while a Linux user may think you mean UTF-8. Those two can be distinguished by type, but it isn't so clear when trying to distinguish between UTF-16BE and UTF-16LE, for instance.
-
September 28th, 2010, 11:15 AM
#7
Re: UNICODE question and help needed
Originally Posted by DangerD
Alrighty lets just get started...
If I write a library that uses UNICODE can someone that isn't using UNICODE in their software be able to use my library without making changes? Or if my library is ASCII (vice versa of above question)
If someone could shed a little more light on UNICODE and these different character sets more generally also (not just related to the above question) it would be much appreciated.
Thanks in advance, DangerD.
You didn't say what platform or OS, so I'll just assume Windows.
If you're providing a static library, then you can do it like the Windows SDK. Export both a "Unicode" and ANSI
function, and use a header file to map the function appropriately. You provide this header file to the user.
An example:
Code:
WINBASEAPI DWORD WINAPI GetModuleFileNameA( HMODULE hModule, LPSTR lpFilename, DWORD nSize );
WINBASEAPI DWORD WINAPI GetModuleFileNameW( HMODULE hModule, LPWSTR lpFilename, DWORD nSize );
#ifdef _UNICODE
#define GetModuleFileName GetModuleFileNameW
#else
#define GetModuleFileName GetModuleFileNameA
#endif
But, you really don't have to do this. You can export a single function, and leave it up the user to pass the correct string format.
They can still compile their project with "Unicode" on or off.
Last edited by Syslock; September 28th, 2010 at 11:49 AM.
-
September 28th, 2010, 02:39 PM
#8
Re: UNICODE question and help needed
The ifdef approaches safeguards Your code from breaking, not the actual data your app is supposed to take care of. My first contact with the UNICODE concept was through the Charles Petzold book. Frankly, I don't remember a thing about it other than the fact that this ifdef approach was somehow one-size-fits-all solution to multi-lingual support. But, now that I think about it and had a chance to work with data written in 4 different languages, I wish either I took what he wrote more seriously, or that he would just simply cut out all those pages and say "get your hands dirty and start digging"
Specifying the encoding scheme clearly where it's appropriate and putting efforts forth to make sure the 'smart users' know what to expect from your library is much more important. Furthermore, if your data comes from over the network theres endianess to worry about subsequently and much more. At some point or the other, you'd have to get down to at least byte level.
so my bottom line is this:
Get your hands dirty and start digging
Last edited by potatoCode; September 28th, 2010 at 02:49 PM.
Reason: fixed some typos
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|