dcsimg
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 9 of 9

Thread: Unicode text file

  1. #1
    Join Date
    Apr 2007
    Location
    South Africa
    Posts
    86

    Unicode text file

    I want to write a Unicode text file to disk. The code I have is as follows:

    PHP Code:
    #include <iostream>
    #include <fstream>
    #include <windows.h>

    int WINAPI wWinMain(HINSTANCE hInstanceHINSTANCEwchar_tcmdParamint cmdShow)
    {
        
    std::wofstream file;
        
    file.open(L"data.txt"std::ios::out);
        
    file << L"data";
        
    file.close();
        return 
    0;

    My problem is that the above looks like it creates an ANSI text file. Isn't the output supposed to be Unicode? I don't understand this since I'm using wide functions.

  2. #2
    Join Date
    Oct 2002
    Location
    Singapore
    Posts
    3,128

    Re: Unicode text file

    Have you use an hex editor to open the file? You should see 0x00 0x64 representing your character 'd', etc.
    quoted from C++ Coding Standards:

    KISS (Keep It Simple Software):
    Correct is better than fast. Simple is better than complex. Clear is better than cute. Safe is better than insecure.

    Avoid magic number:
    Programming isn't magic, so don't incant it.

  3. #3
    Join Date
    Apr 1999
    Posts
    27,449

    Re: Unicode text file

    Quote Originally Posted by links
    My problem is that the above looks like it creates an ANSI text file. Isn't the output supposed to be Unicode? I don't understand this since I'm using wide functions.
    As kirants pointed out, make sure you use a hex editor to inspect these files, not a text editor.

    The reason is that a text editor can do all sorts of tricks to show text in a user-friendly manner (remove tabs, whitespace, interpret Unicode in some way, etc.). This is not what you want to see -- you want to see the actual bytes that make up the file, and only a hex/binary editor is guaranteed to show this to you.

    Regards,

    Paul McKenzie

  4. #4
    Join Date
    Apr 2007
    Location
    South Africa
    Posts
    86

    Re: Unicode text file

    Ok, I've opened the text file in a Hex editor and it is as you say, d is represented by 64. So the way I see it is that this confirms my suspicion that this is an ASCII/ANSI text file? If I type "data" into notepad and save it as an Unicode text file the hex editor shows different hex values.

    So can wofstream create an Unicode text file?

    And as the following is probably related I'll ask here. If I change "data" to "√" (extended ASCII 251) the text file gets created but contains nothing.
    When I turn off Unicode compilation and revert to ofstream I get the following compiler warning:

    warning C4566: character represented by universal-character-name '\u221A' cannot be represented in the current code page (1252)

    The output file then contains the following character: "?"

    All of this is very confusing to me and I'll appreciate if you guys can shed some light on this.

  5. #5
    Join Date
    May 2002
    Posts
    1,435

    Re: Unicode text file

    I don't know of any standard file classes that work well with UNICODE. wofstream uses wchar_t as its element type but actually converts to char before writing. If there is a way to change this behavior I don't know what it is.

    When working with UNICODE files I always create a special class derived from the class I want to use - in your case this would be wofstream - and use unformatted binary write functions. However, the unformatted write() function of wofstream still requires a wchar_t array so I would prefer to use ofstream instead when working with binary.

    You are also forgetting the UNICODE byte order marker (BOM) that must be included at the beginning of a UNICODE text file.
    Code:
    class WOFSTREAM : public std::ofstream
    {
    	public:
    
    		void WriteBOM()
    		{
    			const static wchar_t BOM = 0xfeff;
    			write((const char *)&BOM, sizeof(BOM));
    		}
    
    		WOFSTREAM& operator <<(const wchar_t* text)
    		{
    			const char *pData = (const char *)text;
    			const unsigned int length = wcslen(text) * sizeof(text[0]);
    			write(pData, length);
                            return *this;
    		} 
    		
    };
    
    int WINAPI wWinMain(HINSTANCE hInstance, HINSTANCE, wchar_t* cmdParam, int cmdShow)
    {
        WOFSTREAM file;
        file.open("data.txt", std::ios::out);
        file.WriteBOM();
        file << L"data";
        file.close();
        return 0;
    }

  6. #6
    Join Date
    Jul 2008
    Location
    dalian, China
    Posts
    36

    Re: Unicode text file

    Code:
    wcout  << "\u221a" <<endl;
    This code can show the "√" in the console prompt!
    Cigagou,Cogitou!

  7. #7
    Join Date
    Apr 2007
    Location
    South Africa
    Posts
    86

    Re: Unicode text file

    Thanks 0xC0000005, your post explains the problem. I will definitely be using your code. One would think that the wide functions and classes will be a bit "smarter" when using wide characters.

  8. #8
    Join Date
    Nov 2003
    Posts
    1,902

    Re: Unicode text file

    >> warning C4566: ...
    Here's a post that explains this, and other things you should be aware of: http://www.codeguru.com/forum/showpo...8&postcount=14

    You can prevent wchar_t <-> char conversions by creating your own codecvt facet:
    Code:
    #include <iostream>
    #include <iomanip>
    #include <fstream>
    #include <locale>
    #include <string>
    
    typedef std::codecvt<wchar_t , char , mbstate_t> null_wcodecvt_base;
    
    class null_wcodecvt : public null_wcodecvt_base
    {
    public:
        explicit null_wcodecvt(size_t refs = 0) : null_wcodecvt_base(refs) {}
    
    protected:
        virtual result do_out(mbstate_t&,
                              const wchar_t* from,
                              const wchar_t* from_end,
                              const wchar_t*& from_next,
                              char* to,
                              char* to_end,
                              char*& to_next) const
        {
            size_t len = (from_end - from) * sizeof(wchar_t);
            memcpy(to, from, len);
            from_next = from_end;
            to_next = to + len;
            return ok;
        }//do_out
    
        virtual result do_in(mbstate_t&,
                             const char* from,
                             const char* from_end,
                             const char*& from_next,
                             wchar_t* to,
                             wchar_t* to_end,
                             wchar_t*& to_next) const
        {
            size_t len = (from_end - from);
            memcpy(to, from, len);
            from_next = from_end;
            to_next = to + (len / sizeof(wchar_t));
            return ok;
        }//do_in
    
        virtual result do_unshift(mbstate_t&, char* to, char*,
                                  char*& to_next) const
        {
            to_next = to;
            return noconv;
        }//do_unshift
    
        virtual int do_length(mbstate_t&, const char* from,
                              const char* end, size_t max) const
        {
            return (int)((max < (size_t)(end - from)) ? max : (end - from));
        }//do_length
    
        virtual bool do_always_noconv() const throw()
        {
            return true;
        }//do_always_noconv
    
        virtual int do_encoding() const throw()
        {
            return sizeof(wchar_t);
        }//do_encoding
    
        virtual int do_max_length() const throw()
        {
            return sizeof(wchar_t);
        }//do_max_length
    };//null_wcodecvt
    
    //-----------------------------------------------------------------------------
    
    std::wostream& wendl(std::wostream& out)
    {
        out.put(L'\r');
        out.put(L'\n');
        out.flush();
        return out;
    }//wendl
    
    //-----------------------------------------------------------------------------
    
    const wchar_t UTF_BOM = 0xfeff;
    
    const wchar_t CHECK_SYM = L'\u221a';
    
    int main()
    {
        std::wfstream file;
    
        null_wcodecvt wcodec(1);
        std::locale wloc(std::locale::classic(), &wcodec);
        file.imbue(wloc);
    
        file.open("data.txt", std::ios::out | std::ios::binary);
        if (!file)
        {
            std::cerr << "Failed to open data.txt for writting" << std::endl;
            return 1;
        }//if
    
        file << UTF_BOM << L"data = " << 42 << CHECK_SYM << wendl;
        file.close();
    
        return 0;
    }//main
    Anything that uses the MS CRT will have to open the stream in binary, otherwise CRT functions like fputwc() will convert the wchar_t to char as well. This has the additional side effect of turning off the auto-magic conversion of '\n' -> "\r\n". The "wendl" manipulator helps with this.

    Keep in mind that on *nix with GCC, this *should* create a UTF32 Unicode file. Haven't tested on *nix however. It does work with MSVC 2005 and up, and with mingw+STLport.

    gg

  9. #9
    Join Date
    Jan 2015
    Posts
    1

    Re: Unicode text file

    With VC2012 the code crashes.
    I've got the message "Runtim Error! Program: ... R6025 - pure virtual function call"

    The reason is that the stream's destructor accesses the facet again witch has already been destructed.
    You can fix the code by shifting the creation of the facet before the creation of the stream.
    ...
    null_wcodecvt wcodec(1);
    std::locale wloc(std::locale::classic(), &wcodec);
    std::wfstream file;
    file.imbue(wloc);
    ...

    Thanks to Codeplug for his fine solution.
    Cheers

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width




On-Demand Webinars (sponsored)