CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 17
  1. #1
    Join Date
    Aug 2011
    Location
    Bengaluru
    Posts
    3

    .txt file and Japanese text

    Hi,

    In one of my VC++ application i am using FILE::Write(); funciton to write data to a notepd file. It is fine, but now my requirement is to write Japanese text data to a notepad, for this what i have to do, is there any option to do so? please help me.

    pal

  2. #2
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,430

    Re: .txt file and Japanese text

    I guess zou mean .txt files (not a notepad).
    You should create a UNICODE file (just use a UNICODE build) and preferably with a BOM.
    Victor Nijegorodov

  3. #3
    Join Date
    Nov 2000
    Location
    Voronezh, Russia
    Posts
    6,633

    Re: .txt file and Japanese text

    now my requirement is to write Japanese text data
    Any text is just a set of bytes. You compose those properly in memory, then you write the bytes to file. So, what is your real problem?
    Best regards,
    Igor

  4. #4
    Join Date
    Aug 2011
    Location
    Bengaluru
    Posts
    3

    Re: .txt file and Japanese text

    Yes VictorN you are right, it is .txt file, i want to write JP text to it.
    what's your solution in that case.
    And also sometimes i want to write data to .csv file , what about that one?

  5. #5
    Join Date
    Aug 2011
    Location
    Bengaluru
    Posts
    3

    Re: .txt file and Japanese text

    thanks Igor Vartanov ,
    Can you please tell me how to compose bytes so that i can put JP text in .txt file and .csv files.
    My project settings is unicode for character encoding

  6. #6
    Join Date
    Aug 2008
    Posts
    902

    Re: .txt file and Japanese text

    I would suggest using WideCharToMultiByte to convert your wstring or wchar* data to a UTF-8 encoded string, then you can write the data to file like normal, preferably without a BOM.

  7. #7
    Join Date
    Nov 2000
    Location
    Voronezh, Russia
    Posts
    6,633

    Re: .txt file and Japanese text

    Can you please tell me how to compose bytes so that i can put JP text in .txt file and .csv files.
    You know, this sounds a bit weird to me. You get some bytes from database, text file or user input. You already should know what the bytes are, and of what encoding (code page 932 for example, or some flavor of unicode, as Chris suggested) they are. Then you put the bytes, to txt or csv file. I don't see any problem here, do you?

    Specifically to putting the text to txt. In case your bytes are in CP932, you need to do nothing but write the bites directly to file. It's important to understand, that in this case to see Japanese characters in notepad you have to set up location information for non-unicode text to Japanese.

    In case of some unicode I'd recommend not to avoid BOM, as Victor already said.
    Last edited by Igor Vartanov; August 9th, 2011 at 12:06 AM.
    Best regards,
    Igor

  8. #8
    Join Date
    May 2009
    Location
    Bengaluru, India
    Posts
    460

    Re: .txt file and Japanese text

    what is BOM?

  9. #9
    Join Date
    Aug 2008
    Posts
    902

    Re: .txt file and Japanese text

    Quote Originally Posted by Igor Vartanov View Post
    I'd recommend not to avoid BOM, as Victor already said.
    There is no point of a BOM if you are using UTF-8, since BOM is meant to indicate the byte order (endianness) and there simply is no such thing in UTF-8, so having one is meaningless. I think that dealing with UTF-8 is easier and more readily supported by programs and other operating systems.

  10. #10
    Join Date
    Nov 2000
    Location
    Voronezh, Russia
    Posts
    6,633

    Re: .txt file and Japanese text

    Quote Originally Posted by Chris_F View Post
    There is no point of a BOM if you are using UTF-8, since BOM is meant to indicate the byte order (endianness) and there simply is no such thing in UTF-8, so having one is meaningless.
    Really? Other people (see BOM article that Victor recommended) wouldn't agree with this your statement.
    Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in.
    . . .
    UTF-8

    The UTF-8 representation of the BOM is the byte sequence 0xEF,0xBB,0xBF.
    Last edited by Igor Vartanov; August 9th, 2011 at 02:20 AM.
    Best regards,
    Igor

  11. #11
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,675

    Re: .txt file and Japanese text

    Quote Originally Posted by Chris_F View Post
    There is no point of a BOM if you are using UTF-8, since BOM is meant to indicate the byte order (endianness) and there simply is no such thing in UTF-8, so having one is meaningless. [...]
    Besides what Igor already posted: There definitely is a point in using a BOM for UTF-8. It allows to reliably distiguish the UTF-8 from MBCS encoding which is something at least I really appreciate.
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  12. #12
    Join Date
    Aug 2008
    Posts
    902

    Re: .txt file and Japanese text

    It may very well have it's uses for some, but I personally think it is ugly and unnecessary. UTF-8 is supposed to be backward compatible with ASCII and if you treat a UTF-8 document containing only ASCII characters as if it were ASCII encoded (as you aught to be able to do) then you will end up with 3 garbage characters at the beginning.

    From Wikipedia:
    While the Unicode Standard does allow a BOM in UTF-8,[2] it does not require or recommend it.[3] Byte order has no meaning in UTF-8[4] so a BOM serves only to identify a text stream or file as UTF-8.
    Sure, I'm somewhat naive, but its 2011 and think everyone should just be using BOMless UTF-8 for everything and pretend like other character encodings never even existed. Luckily I use Linux and I'm pretty much able to do just that.

  13. #13
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,675

    Re: .txt file and Japanese text

    Quote Originally Posted by Chris_F View Post
    UTF-8 is supposed to be backward compatible with ASCII and if you treat a UTF-8 document containing only ASCII characters as if it were ASCII encoded (as you aught to be able to do) then you will end up with 3 garbage characters at the beginning.
    Well, strictly speaking, a UTF-8 file (without BOM) that only contains ASCII characters is no UTF-8 file, it's an ASCII file. And of course this has the advantage of being compatible with plainly everything (at least as far as the character set is concerned). If I wanted to get the best of both worlds without requiring the user to always make an explicit choice of character encoding, I'd scan the data before saving it to determine whether it actually does contain non-ASCII characters and insert a BOM only if it does. IMO a justifyable effort with respect to the convenience gain it yields, unless the file is really huge.

    Most (Windows) programs I encounter nowadays simply default to MBCS encoding in the absence of a BOM, no matter what otherwise (and given they support more than one encoding at all). If I give them a BOM-less UTF-8 file with extended characters, I get something ugly as well, though admittedly perhaps not within the first three characters (that, OTOH, are pretty easy to locate ).

    However, .NET programs are an exception from the "most (Windows) programs" rule above: Their stream reader and stream writer constructors prefer to default to BOM-less UTF-8 when writing or when reading a BOM-less file. So I need to make a little extra effort to "get the best of both worlds" in my .NET programs, but I think it's worth it, until the world has become 99% unicodified... (Not all of us share the bliss of writing exclusively or at least mostly on *nix... )
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  14. #14
    Join Date
    Apr 2009
    Posts
    598

    Re: .txt file and Japanese text

    I, too, was wondering whether I should write a BOM header or not.
    Eventually, I decided to follow what Notepad (under Windows 7 home edition 32-bit) is doing.

    And what is Notepad doing when it saves some text containing exotic characters?
    It writes the BOM characters: FF FE.
    And Notepad don't use UTF-8, but Unicode having a fixed length of 2 bytes per character.

    In my software, I write them with good old C functions (fopen(), fputc(), fclose()) initially designed for Ascii text. I add the BOM, and I call two times fputc() for each character. This is somehow a primitive way of doing things, but it works for me.

  15. #15
    Join Date
    Nov 2000
    Location
    Voronezh, Russia
    Posts
    6,633

    Re: .txt file and Japanese text

    Quote Originally Posted by olivthill2 View Post
    And what is Notepad doing when it saves some text containing exotic characters?
    It writes the BOM characters: FF FE.
    And Notepad don't use UTF-8, but Unicode having a fixed length of 2 bytes per character.
    Surprise!
    Attached Images Attached Images
    Best regards,
    Igor

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured