CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 4 of 4
  1. #1
    Join Date
    Jun 2009
    Posts
    1

    Working with accented characters in C++ (Linux)

    Hi,

    I am working with strings in a program that I am developing in C++ using Geany (Linux/Ubuntu). The main issue is that I have to work with texts written in Spanish, so they contain characters such as "á", "é" or "ñ".

    One of the functions that I need most is to find the ASII code of each character in a string. I have found a way to do this:

    //
    char caracteres[128];
    string segmento;
    int asciicode;

    segmento="á, é, ó are characters that can be found in this sentence";
    strcpy(caracteres, segmento.c_str());

    asciicode=int(caracteres[1]);
    //

    This is, I convert the string to a char variable and then I get the ASCII codes. The problem is that this does not seem to work correctly with accented characters. For instance, the character "Ã*" seems to be split in two "chars" with two ASCII codes, -61 and -83, when it should be just one code: 237. I think that this is because I get the strings reading UTF-8 files, but this is something that I can not change.

    Could someone please help me find a way to get the right ASCII code for the characters of a string, even the accented letters?

    Many thanks!!!

  2. #2
    Join Date
    Feb 2009
    Location
    India
    Posts
    444

    Re: Working with accented characters in C++ (Linux)

    You could try _mbscpy instead of strcpy.
    «_Superman_»
    I love work. It gives me something to do between weekends.

    Microsoft MVP (Visual C++)

  3. #3
    Join Date
    Nov 2003
    Posts
    1,902

    Re: Working with accented characters in C++ (Linux)

    Why do you want the "ASCII code"? What do you want to do with this information?

    gg

  4. #4
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: Working with accented characters in C++ (Linux)

    It would make more sense to talk about the Unicode code point. That's a far less restrictive system, and it corresponds to ASCII between 0 and 127.

    The accented characters may be represented by Extended ASCII (128-255), but this range does *not* correspond directly to the corresponding Unicode code point---this is likely your problem.

    Either you need to figure out the mapping from those code points to Extended ASCII, or you need to disregard ASCII and just stick to code points all around. I suggest the latter. Of course, as has been asked, what you need it for is important.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured