CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 8 of 8
  1. #1
    Join Date
    Mar 2003
    Posts
    53

    Foreign Characters

    Hi all.

    I'm having some problems writing foreign characters to xml files.

    Here is a snippet of my test code:

    CFile myFile;
    myFile.Open("test.xml", CFile::modeCreate | CFile::modeWrite, &fileerror);
    myFile.Write("é", strlen("é"));
    myFile.Close();

    As you can see, this simply creates an xml file and attempts to write some foreign characters to it.

    However, when I open this in an XML editor (I use XML SPY), the charatcers show up as garbage. If I take this string in the clipboard and paste it into the file, save, and reopen it, the text shows fine. So I know it's possible to have these characters displaying correctly, but not the way I'm writing them from my application.

    Any help is appreciated on what I would do to fix this.

    Jim

    P.S: These characters show okay in notepad.

  2. #2
    Join Date
    Apr 2005
    Location
    Norway
    Posts
    3,934

    Re: Foreign Characters

    XML documents does not support all characters, that is, some characters need to be encoded.

    Instead of writing 'é' try 'é' or '&232;'.

    Click here for more character encodings/entitys.

    - petter

  3. #3
    Join Date
    Jun 2002
    Location
    Letchworth, UK
    Posts
    1,020

    Re: Foreign Characters

    Your snippet doesn't really write any XML out: all it does is write a foreign character. Have you had a look at it in notepdad? If it is not what you expect, you could try
    Code:
    const char* str = "\xE9";
    myFile.Write(str, strlen(str));
    If that doesn't work, switch on the Unicode flag and try again.
    Succinct is verbose for terse

  4. #4
    Join Date
    Mar 2003
    Posts
    53

    Re: Foreign Characters

    Thanks for the responses.

    I know XML does actually support these characters, because I can paste the text into an XML file manually and they display fine, even if I save and reopen the document.

    Secondly, it DOES display correctly in notepad, but I don't know why this is. This still doesn't fix the problem.

    For some reason there is a difference between creating an XML document manually and pasting in these characters, and creating them using code. The former is fine, but the latter causes problems.

    It doesn't matter that it's not proper XML as this is only for test purposes. It makes no difference if I make it correct XML.

    I cannot pump in the character codes like &eacute, or \xE9, because I have to read in from a foreign XML file in the first place which is full of these characters, and then pump a new one out, keeping all the formatting correct.

    Lastly:

    "If that doesn't work, switch on the Unicode flag and try again."

    What does this mean, and how do I do it?

    Thanks again.

  5. #5
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: Foreign Characters

    It's a problem with codepages. You're writing into the file in the codepage Windows Western Europe (1252). An XML viewer will always treat a file as having the codepage UTF-8 by default. Needless to say Windows 1252 and UTF-8 encode characters in different ways, so you see garbage. So there are two solutions. Either write the file in Unicode (UTF8 or UTF16) or emit a real XML header which tells the XML viewer that the file is in Windows 1252 (it's the same as ISO-8859-1, which is the name that is standardised). You can do this by using the following header:
    Code:
    <?xml encoding="ISO-8859-1"?>
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  6. #6
    Join Date
    Mar 2003
    Posts
    53

    Re: Foreign Characters

    Quote Originally Posted by Yves M
    It's a problem with codepages. You're writing into the file in the codepage Windows Western Europe (1252). An XML viewer will always treat a file as having the codepage UTF-8 by default. Needless to say Windows 1252 and UTF-8 encode characters in different ways, so you see garbage. So there are two solutions. Either write the file in Unicode (UTF8 or UTF16) or emit a real XML header which tells the XML viewer that the file is in Windows 1252 (it's the same as ISO-8859-1, which is the name that is standardised). You can do this by using the following header:
    Code:
    <?xml encoding="ISO-8859-1"?>
    Thanks for this. If I set the file as ISO-8859-1, it works. However, is there a way I can write these foreign characters whilst keeping the file UTF-8?

    If so, how do I do this? If I add the UTF-8 header, the text shows as garbage again, obviously.

    How do I write the file as UTF-8, but still keep the foreign characters? Or are these characters simply not allowed?

    I ask this, because I am reading in a file which has lots of foreign characters, yet is labelled as UTF-8 in the header, and I need to replicate it exactly.

  7. #7
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: Foreign Characters

    I guess you are programming under Windows, so you'll be able to use its conversion functions. Check out this FAQ entry then and scroll down to the "Correct Way". Ignore the stuff above it because it won't work with UTF-8. So if you want to write a UTF-8 file, you'll have to go through the following steps:
    - Open the file in binary mode
    - Convert the string you would like to write into Unicode (using MultiByteToWideChar with CP_ACP or better the real codepage, i.e. 1252)
    - Convert the resulting Unicode string into UTF-8 (using WideCharToMultiByte with CP_UTF8)
    - write this to the file

    Alternatively you can check out Marius' article which explains how UTF8 works and then use the functions in his example.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  8. #8
    Join Date
    Mar 2003
    Posts
    53

    Re: Foreign Characters

    Aha, great. That's all worked. Thanks a lot!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured