InputStreamReader, Charsets, and a surprise character
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 2 of 2

Thread: InputStreamReader, Charsets, and a surprise character

  1. #1
    Join Date
    Feb 2009
    Posts
    16

    InputStreamReader, Charsets, and a surprise character

    I'm trying to write a parser that will read some xml code, altering only certain things, and printing the results to another file. I'm aware of xml parsers etc, but I needed something very specific. I'm using an InputStreamReader and a FileOutputStream with a PrintStream.

    Demo/Test Sample
    Code:
    <xml>
    &#177;
    </xml>
    Doing it with no charset specified (default Cp1252) results in :

    Code:
    <xml>
    &#194;&#177;
    </xml>
    Why is another character added?

    Simplified version of my code. This should open a file, read it, and output it to another file.
    Code:
    File file = new File("myfile.xml");
    BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), "Cp1252"));
    FileOutputStream out = new FileOutputStream("results.xml");
    PrintStream p = new PrintStream(out, true, "Cp1252");
    String textLine = new String();
    		
    while ((textLine = reader.readLine()) != null){
          p.println(textLine);		
    }
    		
    reader.close();
    p.close();
    Thanks for the time

  2. #2
    Join Date
    Feb 2008
    Posts
    966

    Re: InputStreamReader, Charsets, and a surprise character

    It all depends on how the XML is encoded on the other end. Reading the API for InputStreamReader shows that there are several encoding names available to help specify the encoding type.


    • 8859_1 (ISO-8859-1/Latin-1)
    • 8859_2 (ISO-8859-2/Latin-2)
    • 8859_3 (ISO-8859-3/Latin-3)
    • 8859_4 (ISO-8859-4/Latin-4)
    • 8859_5 (ISO-8859-5/Latin-5)
    • 8859_6 (ISO-8859-6/Latin-6)
    • 8859_7 (ISO-8859-7/Latin-7)
    • 8859_8 (ISO-8859-8/Latin-8)
    • 8859_9 (ISO-8859-9/Latin-9)
    • ASCII (7-bit ASCII)
    • UTF8 (UCS Transformation Format-8)

    Personally, I used for a project the following:
    InputStream io = new BufferedInputStream(new FileInputStream(l_file));
    InputStreamReader ir = new InputStreamReader(io,format);

    Where the "format" variable was a String that was either "8859_1" or "UTF8", depending on how the file sent was encoded.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center