-
May 23rd, 2008, 02:29 AM
#1
ConvertStringToBSTR lost char
Dear friends:
I have some codes:
Code:
char szTemp[10000];//store E6 9D 8E E5 85 8B E5 8B A4 00 00
memset(szTemp, 0, sizeof(szTemp));
char szTemp1[10000];
*pVal = _com_util::ConvertStringToBSTR(szTemp);
strcpy(szTemp1, _com_util::ConvertBSTRToString(*pVal));
//szTemp1 store E6 9D 8E E5 85 8B E5 8B 00 00 00
so A4 is lost.
what's the problem?
I have tried several ways to convert and failed.
By the way I need to transfer szTemp to VB and found this problem.So I wrote this code to test the conversion.
thanks a lot
-
May 23rd, 2008, 03:06 AM
#2
Re: ConvertStringToBSTR lost char
Please, post an example that can be compiled and tested.
Victor Nijegorodov
-
May 23rd, 2008, 03:35 AM
#3
Re: ConvertStringToBSTR lost char
Code:
#include <comdef.h>
#include <comutil.h>
#pragma comment(lib, "comsupp.lib")
void main()
{
char szTemp[10000];//store E6 9D 8E E5 85 8B E5 8B A4 00 00
_bstr_t bstr = "李克勤";
memset(szTemp, 0, sizeof(szTemp));
int iOutputLength = WideCharToMultiByte(CP_UTF8,0,bstr,-1,NULL,0,NULL,NULL);
WideCharToMultiByte(CP_UTF8,0,bstr,-1,szTemp,iOutputLength,NULL,NULL);
char szTemp1[10000];
_bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
//szTemp1 store E6 9D 8E E5 85 8B E5 8B 00 00 00
}
Thank you.
-
May 23rd, 2008, 04:13 AM
#4
Re: ConvertStringToBSTR lost char
Well your last code snippet has nothing to do with the first one.
And you are doing a serious mistake: the second parameter of WideCharToMultiByte
must be LPCWSTR , not a _bstr_t.
It should be:
bstr.operator const wchar_t*( )
Besides, it is still not clear how you "store E6 9D 8E E5 85 8B E5 8B A4 00 00" in the char szTemp[10000] variable after you call
Code:
memset(szTemp, 0, sizeof(szTemp));
Victor Nijegorodov
-
May 23rd, 2008, 06:37 AM
#5
Re: ConvertStringToBSTR lost char
Sorry for putting the comment in wrong place.
[CODE]
#include <comdef.h> // MFC core and standard components
#include <comutil.h>
#pragma comment(lib, "comsupp.lib")
void main()
{
char szTemp[10000];
_bstr_t bstr = "李克勤";
memset(szTemp, 0, sizeof(szTemp));
int iOutputLength = WideCharToMultiByte(CP_UTF8,0,bstr,-1,NULL,0,NULL,NULL);
WideCharToMultiByte(CP_UTF8,0,bstr,-1,szTemp,iOutputLength,NULL,NULL);//store E6 9D 8E E5 85 8B E5 8B A4 00 00,saw this with Memory
char szTemp1[10000];
_bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
//store E6 9D 8E E5 85 8B E5 8B A4 00 00,saw this with Memory
}
[CODE]
-
May 23rd, 2008, 07:58 AM
#6
Re: ConvertStringToBSTR lost char
See below in red ...
Originally Posted by sm_ch
Sorry for putting the comment in wrong place.
Code:
#include <comdef.h> // MFC core and standard components
#include <comutil.h>
#pragma comment(lib, "comsupp.lib")
void main()
{
char szTemp[10000];
_bstr_t bstr = "李克勤";
//----------- what do these magic symbols mean?
memset(szTemp, 0, sizeof(szTemp));
int iOutputLength = WideCharToMultiByte(CP_UTF8,0,bstr,-1,NULL,0,NULL,NULL);
WideCharToMultiByte(CP_UTF8,0,bstr,-1,szTemp,iOutputLength,NULL,NULL);//store E6 9D 8E E5 85 8B E5 8B A4 00 00,saw this with Memory
char szTemp1[10000];
_bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
//store E6 9D 8E E5 85 8B E5 8B A4 00 00,saw this with Memory
//-------I don't see any difference! - :confused:
}
Victor Nijegorodov
-
May 23rd, 2008, 08:17 AM
#7
Re: ConvertStringToBSTR lost char
Re: ConvertStringToBSTR lost char
--------------------------------------------------------------------------------
Sorry for putting the comment in wrong place.
[CODE]
#include <comdef.h> // MFC core and standard components
#include <comutil.h>
#pragma comment(lib, "comsupp.lib")
void main()
{
char szTemp[10000];
_bstr_t bstr = "李克勤";//chinese characters.
memset(szTemp, 0, sizeof(szTemp));
int iOutputLength = WideCharToMultiByte(CP_UTF8,0,bstr,-1,NULL,0,NULL,NULL);
WideCharToMultiByte(CP_UTF8,0,bstr,-1,szTemp,iOutputLength,NULL,NULL);//store E6 9D 8E E5 85 8B E5 8B A4 00 00,saw this with Memory
char szTemp1[10000];
_bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
//store E6 9D 8E E5 85 8B E5 8B 00 00 00,saw this with Memory
}
[CODE]
Please run this program in visual c++ 6 and then you can see the error.
-
May 23rd, 2008, 09:49 AM
#8
Re: ConvertStringToBSTR lost char
I cannot run your program because:
a) your "chinese characters" are displayed in my VC++6 editor as "???"
b) my VC++6 editor is a non-unicode editor (as well as youth)
Victor Nijegorodov
-
May 23rd, 2008, 10:00 AM
#9
Re: ConvertStringToBSTR lost char
Thank you for your patience.
Code:
#include <comdef.h> // MFC core and standard components
#include <comutil.h>
#pragma comment(lib, "comsupp.lib")
void main()
{
char szTemp[10000];
memset(szTemp, 0, sizeof(szTemp));
strcpy(szTemp, "\xE6\x9D\x8E\xE5\x85\x8B\xE5\x8B\xA4\x00\x00");//store E6 9D 8E E5 85 8B E5 8B A4 00 00
char szTemp1[10000];
_bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
//store E6 9D 8E E5 85 8B E5 8B 00 00 00,saw this with Memory
}
-
May 23rd, 2008, 10:50 AM
#10
Re: ConvertStringToBSTR lost char
Well, the last code snippet run for me pretty good and the szTemp1 contains exactly the same string as you passed to szTemp:
E6 9D 8E E5 85 8B E5 8B A4 00
Victor Nijegorodov
-
May 23rd, 2008, 11:19 AM
#11
Re: ConvertStringToBSTR lost char
So what is the problem?
I'll try to re-install my Visual C++ tomorrow.
---------------------------------------
God save me. I've try another PC's VC 6.0 and got the same error.
Last edited by sm_ch; May 23rd, 2008 at 11:27 AM.
-
May 23rd, 2008, 04:29 PM
#12
Re: ConvertStringToBSTR lost char
I don't know what your "this" and "another" PC mean, nor which version of VC6 you used.
Mien is VC++6.0 Enterprise Edition with SP 6. It works under Win XP Sp2.
Victor Nijegorodov
-
May 26th, 2008, 12:27 AM
#13
Re: ConvertStringToBSTR lost char
I've tried in winxp sp2/win2k advanced server + vc6.0 and got the same error.
the only difference is that my OSs is simplified chinese.
-
May 26th, 2008, 04:08 AM
#14
Re: ConvertStringToBSTR lost char
In the C++ standard, there are 3 types of "character sets" to consider:
1) The Basic Source Character Set
Originally Posted by ISO/IEC 14882:2003(E)
Character Sets 2.2.1
The basic source character set consists of 96 characters: the space character, the control characters representing
horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ˆ & | ˜ ! = , \ " ’
2) Physical Source File Characters
Originally Posted by ISO/IEC 14882:2003(E)
Phases of translation 2.1.1.1
Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set... Any source file character not in the basic source character set (2.2) is replaced by the universal-character-name that designates that character. ...
3) The Execution Character Set
Originally Posted by ISO/IEC 14882:2003(E)
Phases of translation 2.1.1.5
Each source character set member, escape sequence, or universal-character-name in character literals and string literals is converted to a member of the execution character set (2.13.2, 2.13.4).
Character Sets 2.2.3
The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character ... The execution character set and the execution wide-character set are supersets of the basic execution character set and the basic execution wide-character set, respectively. The values of the members of the execution character sets are implementation-defined, and any additional members are locale-specific.
So now we need to know the defined behavior of VC++ 6.0.
2) Members of source and execution character sets
Originally Posted by MSDN
The source character set is the set of legal characters that can appear in source files. For Microsoft C, the source character set is the standard ASCII character set.
In summary, VC++ 6.0 source code should stick to the ASCII character set. For the "basic execution wide-character set", the MS compiler has always used a UTF16(LE) encoding of Unicode characters. If you want to enter a literal Unicode character in the source, and it can't be done using only ASCII characters, then you have to enter it using the universal-character-name construct or hex escape sequences.
Things are better in VS 2008: http://msdn.microsoft.com/en-us/library/xwy0e8f2.aspx
So now we are ready to tackle the issues with the following:
>> _bstr_t bstr = "李克勤";
Before we can say anything about this, we have to know how the file was actually saved and that depends on the editor. Let's say that the file was saved as UTF8 (with no BOM, or the compiler won't touch it). In this case, the compiler will see the string literal as the string of bytes that make up the UTF8 encoding of that Unicode string. Which is "æŽå…‹å‹¤", or "E6 9D 8E E5 85 8B E5 8B A4". _bstr_t does not expect a UTF8 encoded char string.
Under VS 2005/8, this generates an error:
Code:
const char *p = "李克勤";
// warning C4566: character represented by universal-character-name
// '\u674E' cannot be represented in the current code page (1252)
Here the compiler is following 2.1.1.1, "Any source file character not in the basic source character set is replaced by the universal-character-name...". The UCN character is a 16 bit value trying to fit a 8 bit slot.
In VS 2005/8, you can now have Unicode characters in your source and see their glyphs - since source files can be saved/edited in Unicode. But you still have to keep in mind that all wide character constants and literals are encoded as UTF16-LE in the execution character set. Which means at run-time, L"李克勤" == "674E 514B 52E4".
So here's some code that will run on both 6.0 and 2005/8
Code:
#include <windows.h>
#include <comdef.h>
#include <comutil.h>
#include <iostream>
#include <iomanip>
using namespace std;
void print_hexchars(const wchar_t *sz)
{
for (; *sz; ++sz)
cout << hex << setw(2) << int(*sz) << ' ';
cout << endl << endl;
}//print_hexchars
int main()
{
const char *pUTF8 = "\xE6\x9D\x8E\xE5\x85\x8B\xE5\x8B\xA4\x00";
_bstr_t bstrOfUTF8 = _com_util::ConvertStringToBSTR(pUTF8);
wchar_t wbuff[32];
int res = MultiByteToWideChar(CP_UTF8, 0, pUTF8, -1, wbuff, 32);
if (!res)
{
cerr << "MultiByteToWideChar failed, le = "
<< GetLastError() << endl;
return 1;
}//if
cout << "UTF8 -> UTF16(LE) via ConvertStringToBSTR" << endl;
print_hexchars(bstrOfUTF8);
cout << "UTF8 -> UTF16(LE) via MultiByteToWideChar" << endl;
print_hexchars(wbuff);
return 0;
}//main
Output:
Code:
UTF8 -> UTF16(LE) via ConvertStringToBSTR
e6 9d 17d e5 2026 2039 e5 2039 a4
UTF8 -> UTF16(LE) via MultiByteToWideChar
674e 514b 52e4
ConvertStringToBSTR() has no idead that it needs to use CP_UTF8 when calling MultiByteToWideChar (or perhaps mbstowcs).
gg
-
May 26th, 2008, 04:52 AM
#15
Re: ConvertStringToBSTR lost char
Dear Codeplug:
I ran your program and find that it return:
UTF8 -> UTF16(LE) via ConvertStringToBSTR
93c9 5ea1 53a0 9355 //not the same as your description
UTF8 -> UTF16(LE) via MultiByteToWideChar
674e 514b 52e4
and in
Code:
#include <comdef.h> // MFC core and standard components
#include <comutil.h>
#pragma comment(lib, "comsupp.lib")
void main()
{
char szTemp[10000];
memset(szTemp, 0, sizeof(szTemp));
strcpy(szTemp, "\xE6\x9D\x8E\xE5\x85\x8B\xE5\x8B\xA4\x00\x00");//store E6 9D 8E E5 85 8B E5 8B A4 00 00
char szTemp1[10000];
_bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
//store E6 9D 8E E5 85 8B E5 8B 00 00 00,saw this with Memory
}
I did not use MultiByteToWideChar. I think that ConvertStringToBSTR and then ConvertBSTRToString should has no change unconditionally.
Thanks a lot.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|