-
April 27th, 2011, 05:05 PM
#1
How to send unicode through socket?
Here is my socket client and socket server implementation. The socket server tries to send unicode data to the socket client. But I didn't get it right at this time. Would any guru here point out what went wrong? Thanks a lot.
Code:
// socket server
int _tmain(int argc, _TCHAR* argv[])
{
WORD wVersionRequested;
WSAData data;
wVersionRequested = MAKEWORD(2,2);
if(WSAStartup(wVersionRequested, &data) != 0)
return 1;
SOCKET sock = socket(AF_INET,SOCK_STREAM, IPPROTO_TCP);
if(sock == INVALID_SOCKET)
return 1;
sockaddr_in sa;
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(1101);
if(bind(sock, (sockaddr*)&sa, sizeof(sa)) != 0)
return 1;
if(listen(sock, 10) != 0)
return 1;
if((sock = accept(sock, (sockaddr*)&sa, NULL)) == INVALID_SOCKET)
return 1;
char buffer[128];
strcpy(buffer, "我们");
if(send(sock, buffer, strlen(buffer)+1, 0) == SOCKET_ERROR)
return 1;
closesocket(sock);
return 0;
}
// socket client
int _tmain(int argc, _TCHAR* argv[])
{
WORD wVersionRequested = MAKEWORD(2,2);
WSADATA data;
if(WSAStartup(wVersionRequested, &data)!=0)
return 1;
SOCKET sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if(sock == INVALID_SOCKET)
return 1;
sockaddr_in sa;
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(1101);
if(connect(sock, (sockaddr*)&sa, sizeof(sa)) == SOCKET_ERROR)
return 1;
char buf[128];
if(recv(sock, buf, strlen(buf)+1, 0) == SOCKET_ERROR)
return 1;
closesocket(sock);
return 0;
}
-
April 27th, 2011, 06:42 PM
#2
Re: How to send unicode through socket?
strcpy and strlen are all ANSI-based functions and are unlikely to work correctly on Unicode strings.
Try re-coding with the correct functions (like wstrcpy and wcslen).
-
April 27th, 2011, 08:07 PM
#3
Re: How to send unicode through socket?
If you are going to send Unicode over a socket, then you might as well encode it as UTF8 and send it that way.
Avoid using extended characters in your source code: http://www.codeguru.com/forum/showpo...8&postcount=14
In Visual Studio, you could save the source file as "UTF8 with signature", and then 'buffer' would contain the UTF8 encoding of "我们". But read the link above to understand the risks of doing this.
gg
-
April 28th, 2011, 09:08 AM
#4
Re: How to send unicode through socket?
I modified my code as you guys suggested but it still doesn't work. Could any guru here point out directly what went wrong in my code? Thanks a lot.
Code:
// socket client
int _tmain(int argc, _TCHAR* argv[])
{
WORD wVersionRequested = MAKEWORD(2,2);
WSADATA data;
if(WSAStartup(wVersionRequested, &data)!=0)
return 1;
SOCKET sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if(sock == INVALID_SOCKET)
return 1;
sockaddr_in sa;
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(1101);
if(connect(sock, (sockaddr*)&sa, sizeof(sa)) == SOCKET_ERROR)
return 1;
char buf[128];
if(recv(sock, buf, sizeof(buf), 0) == SOCKET_ERROR)
return 1;
wchar_t wBuf[128];
int len = mbstowcs(NULL,buf, 0);
mbstowcs(wBuf, buf, len+1);
closesocket(sock);
return 0;
}
// socket server
int _tmain(int argc, _TCHAR* argv[])
{
WORD wVersionRequested;
WSAData data;
wVersionRequested = MAKEWORD(2,2);
if(WSAStartup(wVersionRequested, &data) != 0)
return 1;
SOCKET sock = socket(AF_INET,SOCK_STREAM, IPPROTO_TCP);
if(sock == INVALID_SOCKET)
return 1;
sockaddr_in sa;
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(1101);
if(bind(sock, (sockaddr*)&sa, sizeof(sa)) != 0)
return 1;
if(listen(sock, 10) != 0)
return 1;
if((sock = accept(sock, (sockaddr*)&sa, NULL)) == INVALID_SOCKET)
return 1;
wchar_t buffer[128];
wcscpy(buffer, L"我们");
int len = wcslen(buffer);
if(send(sock, (char*)buffer, (len+1)*2, 0) == SOCKET_ERROR)
return 1;
closesocket(sock);
return 0;
}
-
April 28th, 2011, 09:45 AM
#5
Re: How to send unicode through socket?
Okey, the following code works! But I still have a question. When the socket server calls send and socket client calls recv, the size of characters sent/received takes sizeof(buffer), which is not the real size of the buffer. I wonder why? Thanks.
Code:
// socket client
int _tmain(int argc, _TCHAR* argv[])
{
WORD wVersionRequested = MAKEWORD(2,2);
WSADATA data;
if(WSAStartup(wVersionRequested, &data)!=0)
return 1;
SOCKET sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if(sock == INVALID_SOCKET)
return 1;
sockaddr_in sa;
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(1101);
if(connect(sock, (sockaddr*)&sa, sizeof(sa)) == SOCKET_ERROR)
return 1;
wchar_t wBuf[128];
if(recv(sock, (char*)wBuf, sizeof(wBuf), 0) == SOCKET_ERROR)
return 1;
closesocket(sock);
return 0;
}
// socket server
int _tmain(int argc, _TCHAR* argv[])
{
WORD wVersionRequested;
WSAData data;
wVersionRequested = MAKEWORD(2,2);
if(WSAStartup(wVersionRequested, &data) != 0)
return 1;
SOCKET sock = socket(AF_INET,SOCK_STREAM, IPPROTO_TCP);
if(sock == INVALID_SOCKET)
return 1;
sockaddr_in sa;
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(1101);
if(bind(sock, (sockaddr*)&sa, sizeof(sa)) != 0)
return 1;
if(listen(sock, 10) != 0)
return 1;
if((sock = accept(sock, (sockaddr*)&sa, NULL)) == INVALID_SOCKET)
return 1;
wchar_t buffer[128];
wcscpy(buffer, L"我们");
int len = sizeof(buffer);
if(send(sock, (char*)buffer, len, 0) == SOCKET_ERROR)
return 1;
closesocket(sock);
return 0;
}
-
April 28th, 2011, 10:08 AM
#6
Re: How to send unicode through socket?
send/recv work on bytes. You can send/recv whatever bytes you want. recv() takes the size of the given buffer and returns the number of bytes actually read.
In this case, you're sending bytes that represents a UTF16LE encoded string - or a wchar_t string in Windows.
gg
-
April 28th, 2011, 10:23 AM
#7
Re: How to send unicode through socket?
Originally Posted by Codeplug
send/recv work on bytes. You can send/recv whatever bytes you want. recv() takes the size of the given buffer and returns the number of bytes actually read.
In this case, you're sending bytes that represents a UTF16LE encoded string - or a wchar_t string in Windows.
gg
Thanks for your reply. Assume I call the function send like the following,
Code:
char buffer[128];
strcpy(buffer, "hello world");
int len = sizeof(buffer);
if(send(sock, (char*)buffer, len, 0) == SOCKET_ERROR)
return 1;
Obviously when I call send, I pass 128 as size. But the real size of buffer should be 11. So my question should I pass 128 or 11 to send? Thanks.
-
April 28th, 2011, 11:18 AM
#8
Re: How to send unicode through socket?
Consult the manual if you're not sure how to use an API:
http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx
http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx
send - length is size of data
recv - length is size of buffer
gg
-
April 28th, 2011, 11:33 AM
#9
Re: How to send unicode through socket?
Originally Posted by Codeplug
Actually I already checked the msdn but I was not sure if I understand them correctly. According to msdn, the size passed to both send and recv should have exactly the same meaning, .ie., "The length, in bytes, of the data in buffer pointed to by the buf parameter". More precisely, the size taken from sizeof(buffer). But when you said "send - length is size of data" and "recv - length is size of buffer", do you mean send takes real size(for example based on strlen) and recv takes buffer size(based on sizeof(buffer))? It looks like they have difference meaning. Would you please clarify a little bit? Thanks.
-
April 28th, 2011, 11:51 AM
#10
Re: How to send unicode through socket?
>> According to msdn, the size passed to both send and recv should have exactly the same meaning
send: The length, in bytes, of the data
recv: The length, in bytes, of the buffer
These do not have the same meaning.
Code:
char buffer[32] = "123";
buffer size = 32
data size = 4
gg
-
April 28th, 2011, 12:12 PM
#11
Re: How to send unicode through socket?
Originally Posted by Codeplug
>> According to msdn, the size passed to both send and recv should have exactly the same meaning
send: The length, in bytes, of the data
recv: The length, in bytes, of the buffer
These do not have the same meaning.
Code:
char buffer[32] = "123";
buffer size = 32
data size = 4
gg
I tried to pass length of data(in your example 4) or length of buffer(in your example 32) to the function send. But it looks like they both work properly. Any comments? Thanks.
-
April 28th, 2011, 12:20 PM
#12
Re: How to send unicode through socket?
>> But it looks like they both work properly.
Because the null-terminator of the string (the last byte of the data) is sent in both cases. The difference is the number of bytes being transmitted (4 or 32). You only need the 4 bytes - no need to send the remaining 28 bytes that aren't used.
gg
-
April 28th, 2011, 12:33 PM
#13
Re: How to send unicode through socket?
Originally Posted by Codeplug
>> But it looks like they both work properly.
Because the null-terminator of the string (the last byte of the data) is sent in both cases. The difference is the number of bytes being transmitted (4 or 32). You only need the 4 bytes - no need to send the remaining 28 bytes that aren't used.
gg
I got it. Now I feel more comfortable. Thanks a lot.
-
April 28th, 2011, 12:53 PM
#14
Re: How to send unicode through socket?
Another thing to be aware of, when using TCP:
TCP is a stream-based protocol, which means that it simply sends a stream of bytes and receives a stream of bytes, which can be received in chunks as short as a single byte. It's up to you, as the application-writer, to give meaning to the stream of bytes. If you want the stream of bytes to represent a NULL-terminated Unicode text string, you need to code the architecture to realize this effect.
The important part is that calls to send and receive are not somehow magically corellated. A single call to send() 100 bytes will not automatically result in a single call to recv() that receives exactly 100 bytes. In the toy application that you are using now, in which the client and server are probably running on the same machine, you probably get that result accidentally by chance (since the loopback mechanism skips the step of writing bits onto the real-world network wire). But once you implement your code on the Internet, you will immediately see situations where one peer calls send() with 100 bytes, but the other peer only gets a few of those bytes in a single call to recv().
When you see this happen, you might think that data is being lost, but in fact the data is there. Some or all of it will be received the next time that recv() is called.
So, what does all this mean? It means that you, as the application-write, must impose an application-specific protocol to give meaning to the stream of bytes.
One commonly-used protocol with strings is to pre-pend each sending of a string with an integer that represents the count of characters being sent. Armed with that knowledge, the recipient will know exactly how many characters to process before concluding that the end of the string has been encountered.
Mike
-
April 28th, 2011, 01:02 PM
#15
Re: How to send unicode through socket?
Originally Posted by MikeAThon
Another thing to be aware of, when using TCP:
TCP is a stream-based protocol, which means that it simply sends a stream of bytes and receives a stream of bytes, which can be received in chunks as short as a single byte. It's up to you, as the application-writer, to give meaning to the stream of bytes. If you want the stream of bytes to represent a NULL-terminated Unicode text string, you need to code the architecture to realize this effect.
The important part is that calls to send and receive are not somehow magically corellated. A single call to send() 100 bytes will not automatically result in a single call to recv() that receives exactly 100 bytes. In the toy application that you are using now, in which the client and server are probably running on the same machine, you probably get that result accidentally by chance (since the loopback mechanism skips the step of writing bits onto the real-world network wire). But once you implement your code on the Internet, you will immediately see situations where one peer calls send() with 100 bytes, but the other peer only gets a few of those bytes in a single call to recv().
When you see this happen, you might think that data is being lost, but in fact the data is there. Some or all of it will be received the next time that recv() is called.
So, what does all this mean? It means that you, as the application-write, must impose an application-specific protocol to give meaning to the stream of bytes.
One commonly-used protocol with strings is to pre-pend each sending of a string with an integer that represents the count of characters being sent. Armed with that knowledge, the recipient will know exactly how many characters to process before concluding that the end of the string has been encountered.
Mike
This is so enlightful and I really appreciate it.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|