How to send unicode through socket?
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 15 of 15

Thread: How to send unicode through socket?

  1. #1
    Join Date
    Jul 2005
    Posts
    894

    How to send unicode through socket?

    Here is my socket client and socket server implementation. The socket server tries to send unicode data to the socket client. But I didn't get it right at this time. Would any guru here point out what went wrong? Thanks a lot.
    Code:
    // socket server
    
    int _tmain(int argc, _TCHAR* argv[])
    {
    	WORD wVersionRequested;
    	WSAData data;
    
    	wVersionRequested = MAKEWORD(2,2);
    
    	if(WSAStartup(wVersionRequested, &data) != 0)
    		return 1;
    
    	SOCKET sock = socket(AF_INET,SOCK_STREAM, IPPROTO_TCP);
    
    	if(sock == INVALID_SOCKET)
    		return 1;
    
    	sockaddr_in sa;
    	sa.sin_family = AF_INET;
    	sa.sin_addr.s_addr = inet_addr("127.0.0.1");
    	sa.sin_port = htons(1101);
    
    	if(bind(sock, (sockaddr*)&sa, sizeof(sa)) != 0)
    		return 1;
    
    	if(listen(sock, 10) != 0)
    		return 1;
    
    	if((sock = accept(sock, (sockaddr*)&sa, NULL)) == INVALID_SOCKET)
    		return 1;
    
    	char buffer[128];
    	strcpy(buffer, "我们");
    	
    	if(send(sock, buffer, strlen(buffer)+1, 0) == SOCKET_ERROR)
    		return 1;
    
    	closesocket(sock);
    	return 0;
    }
    
    // socket client
    
    int _tmain(int argc, _TCHAR* argv[])
    {
    	WORD wVersionRequested = MAKEWORD(2,2);
    	WSADATA data;
    
    	if(WSAStartup(wVersionRequested, &data)!=0)
    		return 1;
    
    	SOCKET sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    
    	if(sock == INVALID_SOCKET)
    		return 1;
    
    	sockaddr_in sa;
    	sa.sin_family = AF_INET;
    	sa.sin_addr.s_addr = inet_addr("127.0.0.1");
    	sa.sin_port = htons(1101);
    
    	if(connect(sock, (sockaddr*)&sa, sizeof(sa)) == SOCKET_ERROR)
    		return 1;
    
    	char buf[128];
    
    	if(recv(sock, buf, strlen(buf)+1, 0) == SOCKET_ERROR)
    		return 1;
    
    	closesocket(sock);
    	return 0;
    }

  2. #2
    Join Date
    Nov 2002
    Location
    California
    Posts
    4,553

    Re: How to send unicode through socket?

    strcpy and strlen are all ANSI-based functions and are unlikely to work correctly on Unicode strings.

    Try re-coding with the correct functions (like wstrcpy and wcslen).

  3. #3
    Join Date
    Nov 2003
    Posts
    1,786

    Re: How to send unicode through socket?

    If you are going to send Unicode over a socket, then you might as well encode it as UTF8 and send it that way.

    Avoid using extended characters in your source code: http://www.codeguru.com/forum/showpo...8&postcount=14

    In Visual Studio, you could save the source file as "UTF8 with signature", and then 'buffer' would contain the UTF8 encoding of "我们". But read the link above to understand the risks of doing this.

    gg

  4. #4
    Join Date
    Jul 2005
    Posts
    894

    Re: How to send unicode through socket?

    I modified my code as you guys suggested but it still doesn't work. Could any guru here point out directly what went wrong in my code? Thanks a lot.
    Code:
    // socket client
    int _tmain(int argc, _TCHAR* argv[])
    {
    	WORD wVersionRequested = MAKEWORD(2,2);
    	WSADATA data;
    
    	if(WSAStartup(wVersionRequested, &data)!=0)
    		return 1;
    
    	SOCKET sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    
    	if(sock == INVALID_SOCKET)
    		return 1;
    
    	sockaddr_in sa;
    	sa.sin_family = AF_INET;
    	sa.sin_addr.s_addr = inet_addr("127.0.0.1");
    	sa.sin_port = htons(1101);
    
    	if(connect(sock, (sockaddr*)&sa, sizeof(sa)) == SOCKET_ERROR)
    		return 1;
    
    	char buf[128];
    
    	if(recv(sock, buf, sizeof(buf), 0) == SOCKET_ERROR)
    		return 1;
    
    	wchar_t wBuf[128];
    
    	int len = mbstowcs(NULL,buf, 0); 
    	mbstowcs(wBuf, buf, len+1);
    
    	closesocket(sock);
    	return 0;
    }
    
    // socket server
    
    int _tmain(int argc, _TCHAR* argv[])
    {
    	WORD wVersionRequested;
    	WSAData data;
    
    	wVersionRequested = MAKEWORD(2,2);
    
    	if(WSAStartup(wVersionRequested, &data) != 0)
    		return 1;
    
    	SOCKET sock = socket(AF_INET,SOCK_STREAM, IPPROTO_TCP);
    
    	if(sock == INVALID_SOCKET)
    		return 1;
    
    	sockaddr_in sa;
    	sa.sin_family = AF_INET;
    	sa.sin_addr.s_addr = inet_addr("127.0.0.1");
    	sa.sin_port = htons(1101);
    
    	if(bind(sock, (sockaddr*)&sa, sizeof(sa)) != 0)
    		return 1;
    
    	if(listen(sock, 10) != 0)
    		return 1;	
    
    	if((sock = accept(sock, (sockaddr*)&sa, NULL)) == INVALID_SOCKET)
    		return 1;
    
    	wchar_t buffer[128];
    	wcscpy(buffer, L"我们");
    	int len = wcslen(buffer);
    	
    	if(send(sock, (char*)buffer, (len+1)*2, 0) == SOCKET_ERROR)
    		return 1;
    
    	closesocket(sock);
    	return 0;
    }

  5. #5
    Join Date
    Jul 2005
    Posts
    894

    Re: How to send unicode through socket?

    Okey, the following code works! But I still have a question. When the socket server calls send and socket client calls recv, the size of characters sent/received takes sizeof(buffer), which is not the real size of the buffer. I wonder why? Thanks.
    Code:
    // socket client
    
    int _tmain(int argc, _TCHAR* argv[])
    {
    	WORD wVersionRequested = MAKEWORD(2,2);
    	WSADATA data;
    
    	if(WSAStartup(wVersionRequested, &data)!=0)
    		return 1;
    
    	SOCKET sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    
    	if(sock == INVALID_SOCKET)
    		return 1;
    
    	sockaddr_in sa;
    	sa.sin_family = AF_INET;
    	sa.sin_addr.s_addr = inet_addr("127.0.0.1");
    	sa.sin_port = htons(1101);
    
    	if(connect(sock, (sockaddr*)&sa, sizeof(sa)) == SOCKET_ERROR)
    		return 1;
    
    	wchar_t wBuf[128];
    
    	if(recv(sock, (char*)wBuf, sizeof(wBuf), 0) == SOCKET_ERROR)
    		return 1;
    
    	closesocket(sock);
    	return 0;
    }
    
    // socket server
    
    int _tmain(int argc, _TCHAR* argv[])
    {
    	WORD wVersionRequested;
    	WSAData data;
    
    	wVersionRequested = MAKEWORD(2,2);
    
    	if(WSAStartup(wVersionRequested, &data) != 0)
    		return 1;
    
    	SOCKET sock = socket(AF_INET,SOCK_STREAM, IPPROTO_TCP);
    
    	if(sock == INVALID_SOCKET)
    		return 1;
    
    	sockaddr_in sa;
    	sa.sin_family = AF_INET;
    	sa.sin_addr.s_addr = inet_addr("127.0.0.1");
    	sa.sin_port = htons(1101);
    
    	if(bind(sock, (sockaddr*)&sa, sizeof(sa)) != 0)
    		return 1;
    
    	if(listen(sock, 10) != 0)
    		return 1;	
    	
    	if((sock = accept(sock, (sockaddr*)&sa, NULL)) == INVALID_SOCKET)
    		return 1;
    
    	wchar_t buffer[128];
    	wcscpy(buffer, L"我们");
    	int len = sizeof(buffer);
    	
    	if(send(sock, (char*)buffer, len, 0) == SOCKET_ERROR)
    		return 1;
    
    	closesocket(sock);
    	return 0;
    }

  6. #6
    Join Date
    Nov 2003
    Posts
    1,786

    Re: How to send unicode through socket?

    send/recv work on bytes. You can send/recv whatever bytes you want. recv() takes the size of the given buffer and returns the number of bytes actually read.

    In this case, you're sending bytes that represents a UTF16LE encoded string - or a wchar_t string in Windows.

    gg

  7. #7
    Join Date
    Jul 2005
    Posts
    894

    Re: How to send unicode through socket?

    Quote Originally Posted by Codeplug View Post
    send/recv work on bytes. You can send/recv whatever bytes you want. recv() takes the size of the given buffer and returns the number of bytes actually read.

    In this case, you're sending bytes that represents a UTF16LE encoded string - or a wchar_t string in Windows.

    gg
    Thanks for your reply. Assume I call the function send like the following,
    Code:
    char buffer[128];
    strcpy(buffer, "hello world");
    int len = sizeof(buffer);
    if(send(sock, (char*)buffer, len, 0) == SOCKET_ERROR)
         return 1;
    Obviously when I call send, I pass 128 as size. But the real size of buffer should be 11. So my question should I pass 128 or 11 to send? Thanks.

  8. #8
    Join Date
    Nov 2003
    Posts
    1,786

    Re: How to send unicode through socket?

    Consult the manual if you're not sure how to use an API:
    http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx
    http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx

    send - length is size of data
    recv - length is size of buffer

    gg

  9. #9
    Join Date
    Jul 2005
    Posts
    894

    Re: How to send unicode through socket?

    Quote Originally Posted by Codeplug View Post
    Consult the manual if you're not sure how to use an API:
    http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx
    http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx

    send - length is size of data
    recv - length is size of buffer

    gg
    Actually I already checked the msdn but I was not sure if I understand them correctly. According to msdn, the size passed to both send and recv should have exactly the same meaning, .ie., "The length, in bytes, of the data in buffer pointed to by the buf parameter". More precisely, the size taken from sizeof(buffer). But when you said "send - length is size of data" and "recv - length is size of buffer", do you mean send takes real size(for example based on strlen) and recv takes buffer size(based on sizeof(buffer))? It looks like they have difference meaning. Would you please clarify a little bit? Thanks.

  10. #10
    Join Date
    Nov 2003
    Posts
    1,786

    Re: How to send unicode through socket?

    >> According to msdn, the size passed to both send and recv should have exactly the same meaning
    send: The length, in bytes, of the data
    recv: The length, in bytes, of the buffer

    These do not have the same meaning.
    Code:
    char buffer[32] = "123";
    buffer size = 32
    data size = 4

    gg

  11. #11
    Join Date
    Jul 2005
    Posts
    894

    Re: How to send unicode through socket?

    Quote Originally Posted by Codeplug View Post
    >> According to msdn, the size passed to both send and recv should have exactly the same meaning
    send: The length, in bytes, of the data
    recv: The length, in bytes, of the buffer

    These do not have the same meaning.
    Code:
    char buffer[32] = "123";
    buffer size = 32
    data size = 4

    gg
    I tried to pass length of data(in your example 4) or length of buffer(in your example 32) to the function send. But it looks like they both work properly. Any comments? Thanks.

  12. #12
    Join Date
    Nov 2003
    Posts
    1,786

    Re: How to send unicode through socket?

    >> But it looks like they both work properly.
    Because the null-terminator of the string (the last byte of the data) is sent in both cases. The difference is the number of bytes being transmitted (4 or 32). You only need the 4 bytes - no need to send the remaining 28 bytes that aren't used.

    gg

  13. #13
    Join Date
    Jul 2005
    Posts
    894

    Re: How to send unicode through socket?

    Quote Originally Posted by Codeplug View Post
    >> But it looks like they both work properly.
    Because the null-terminator of the string (the last byte of the data) is sent in both cases. The difference is the number of bytes being transmitted (4 or 32). You only need the 4 bytes - no need to send the remaining 28 bytes that aren't used.

    gg
    I got it. Now I feel more comfortable. Thanks a lot.

  14. #14
    Join Date
    Nov 2002
    Location
    California
    Posts
    4,553

    Re: How to send unicode through socket?

    Another thing to be aware of, when using TCP:

    TCP is a stream-based protocol, which means that it simply sends a stream of bytes and receives a stream of bytes, which can be received in chunks as short as a single byte. It's up to you, as the application-writer, to give meaning to the stream of bytes. If you want the stream of bytes to represent a NULL-terminated Unicode text string, you need to code the architecture to realize this effect.

    The important part is that calls to send and receive are not somehow magically corellated. A single call to send() 100 bytes will not automatically result in a single call to recv() that receives exactly 100 bytes. In the toy application that you are using now, in which the client and server are probably running on the same machine, you probably get that result accidentally by chance (since the loopback mechanism skips the step of writing bits onto the real-world network wire). But once you implement your code on the Internet, you will immediately see situations where one peer calls send() with 100 bytes, but the other peer only gets a few of those bytes in a single call to recv().

    When you see this happen, you might think that data is being lost, but in fact the data is there. Some or all of it will be received the next time that recv() is called.

    So, what does all this mean? It means that you, as the application-write, must impose an application-specific protocol to give meaning to the stream of bytes.

    One commonly-used protocol with strings is to pre-pend each sending of a string with an integer that represents the count of characters being sent. Armed with that knowledge, the recipient will know exactly how many characters to process before concluding that the end of the string has been encountered.

    Mike

  15. #15
    Join Date
    Jul 2005
    Posts
    894

    Re: How to send unicode through socket?

    Quote Originally Posted by MikeAThon View Post
    Another thing to be aware of, when using TCP:

    TCP is a stream-based protocol, which means that it simply sends a stream of bytes and receives a stream of bytes, which can be received in chunks as short as a single byte. It's up to you, as the application-writer, to give meaning to the stream of bytes. If you want the stream of bytes to represent a NULL-terminated Unicode text string, you need to code the architecture to realize this effect.

    The important part is that calls to send and receive are not somehow magically corellated. A single call to send() 100 bytes will not automatically result in a single call to recv() that receives exactly 100 bytes. In the toy application that you are using now, in which the client and server are probably running on the same machine, you probably get that result accidentally by chance (since the loopback mechanism skips the step of writing bits onto the real-world network wire). But once you implement your code on the Internet, you will immediately see situations where one peer calls send() with 100 bytes, but the other peer only gets a few of those bytes in a single call to recv().

    When you see this happen, you might think that data is being lost, but in fact the data is there. Some or all of it will be received the next time that recv() is called.

    So, what does all this mean? It means that you, as the application-write, must impose an application-specific protocol to give meaning to the stream of bytes.

    One commonly-used protocol with strings is to pre-pend each sending of a string with an integer that represents the count of characters being sent. Armed with that knowledge, the recipient will know exactly how many characters to process before concluding that the end of the string has been encountered.

    Mike
    This is so enlightful and I really appreciate it.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Azure Activities Information Page

Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center