CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 11 of 11
  1. #1
    Join Date
    Sep 2010
    Posts
    18

    [C++] East European Fonts - Windows console

    Hello,

    I'm trying to write a program, which uses east european characters when writing/reading. Any basic program, like:
    Code:
    #include <iostream>
    #include <string>
    using namespace std;
    int main(){
    
    cout << "ĄĄĄĄĄŻŻŻŻ"<<endl;
    return 0;
    }
    Puts to console something like: "ñññññ╜╜╜╜" (when the source file is coded as OEM 852) and like "ÑÑÑÑÑ»»»»" (when the source file is coded as Windows-1250).

    My console says:
    Code:
    C:\>chcp
    Active code page: 437
    I've tried using locales (setlocale(LC_ALL, "Polish") doesn't work), using OemToChar and CharToOem (without any result).

    I believe there is a very simple solution to this one, however, I've never worked in codepage subject, so for me it's quite new issue.

    I'd be very grateful for any help.

  2. #2
    Join Date
    Nov 2003
    Posts
    1,902

    Re: [C++] East European Fonts - Windows console

    Most of what you need to know can be found here: http://cboard.cprogramming.com/c-pro...E-console.html

    You should also avoid using any "special" characters directly in your source code.
    http://www.codeguru.com/forum/showpo...8&postcount=14

    gg

  3. #3
    Join Date
    Mar 2009
    Location
    Riga, Latvia
    Posts
    128

    Re: [C++] East European Fonts - Windows console

    You you are using Windows console use WinAPI only.

    See this thread: http://www.codeguru.com/forum/showthread.php?t=472959

  4. #4
    Join Date
    Nov 2003
    Posts
    1,902

    Re: [C++] East European Fonts - Windows console

    >> Puts to console something like: "ñññññ╜╜╜╜" (when the source file is coded as OEM 852) and like "ÑÑÑÑÑ»»»»" (when the source file is coded as Windows-1250).

    Ą = U+0104 (LATIN CAPITAL LETTER A WITH OGONEK)
    In codepage 852, Ą = 0xA4
    In codepage 1250, Ą = 0xA5

    Ż = U+017B (LATIN CAPITAL LETTER Z WITH DOT ABOVE)
    In codepage 852, Ż = 0xBD
    In codepage 1250, Ż = 0xAF

    The CRT was trying to output the characters using codepage 437.
    In codepage 437:
    0xA4 = ñ
    0xA5 = Ñ
    0xBD = ╜
    0xAF = »
    Which is why you saw those characters when you tried saving your source in CP852 and CP1250.

    To lookup codepage glyphs and values:
    http://msdn.microsoft.com/en-us/goglobal/cc305156.aspx
    http://msdn.microsoft.com/en-us/goglobal/cc305161.aspx
    http://msdn.microsoft.com/en-us/goglobal/cc305143.aspx

    Here's some code that covers VC++ and MinGW on new and old CRT's:
    Code:
    #if (defined(_MSC_VER) && (_MSC_VER >= 1400)) || \
        (defined(__MINGW32__) && (__MSVCRT_VERSION__ >= 0x0800))
    // VS 2005 or higher, or MinGW with 2005 CRT or higher
    #define HAVE_NEW_MS_CRT
    #endif
    
    #if (defined(_MSC_VER) && (_MSC_VER < 1400)) || \
        (defined(__MINGW32__) && (__MSVCRT_VERSION__ < 0x0800))
    // VS 6.0, VS 2003, or MinGW with pre-2005 CRT    
    #define HAVE_OLD_MS_CRT
    #endif
    
    #include <windows.h>
    #include <stdio.h>
    #include <locale.h>
    
    #ifdef HAVE_NEW_MS_CRT
    #include <fcntl.h>
    #include <io.h>
    #endif
    
    #include <iostream>
    using namespace std;
    
    namespace CP1250
    {
        const char A_OGONEK    = char(0xA5);
        const char Z_DOT_ABOVE = char(0xAF);
    }
    
    namespace CP852
    {
        const char A_OGONEK    = char(0xA4);
        const char Z_DOT_ABOVE = char(0xBD);
    }
    
    const wchar_t A_OGONEK    = 0x0104;
    const wchar_t Z_DOT_ABOVE = 0x017B;
    
    int main()
    {
    #if !defined(HAVE_NEW_MS_CRT) && !defined(HAVE_OLD_MS_CRT)
        cout << "Untested CRT" << endl;
    #endif
        UINT con_cp = GetConsoleCP();
        UINT con_out_cp = GetConsoleOutputCP();
    
        // use CRT, direct CP1250
        SetConsoleOutputCP(1250);
        printf("CP1250(CRT): %c%c  ", CP1250::A_OGONEK, CP1250::Z_DOT_ABOVE);
        cout << "cout: " << CP1250::A_OGONEK << CP1250::Z_DOT_ABOVE << endl;
    
        // use CRT, direct CP852
        SetConsoleOutputCP(852);
        printf("CP852(CRT): %c%c  ", CP852::A_OGONEK, CP852::Z_DOT_ABOVE);
        cout << "cout: " << CP852::A_OGONEK << CP852::Z_DOT_ABOVE << endl;
    
    #if defined HAVE_OLD_MS_CRT
        // VS 6.0, VS 2003, or MinGW with pre-2005 CRT
        // Wide output with this CRT requires the locale CP and console CP to match
        setlocale(LC_ALL, ".1250");
        SetConsoleOutputCP(1250);
        SetConsoleCP(1250);
        wprintf(L"Unicode->Locale/ConsoleCP(1250): %c%c  ", 
                A_OGONEK, Z_DOT_ABOVE);
        
        // wide ostreams in VS 6.0 do not support single wchar_t output, it has to
        // be wchar_t string
        const wchar_t wstr[] = {A_OGONEK, Z_DOT_ABOVE, 0};
        wcout << L"wcout: " << wstr << endl;
    
        setlocale(LC_ALL, ".852");
        SetConsoleOutputCP(852);
        SetConsoleCP(852);
        wprintf(L"Unicode->Locale/ConsoleCP(852): %c%c  ", 
                A_OGONEK, Z_DOT_ABOVE);
        wcout << L"wcout: " << wstr << endl;
    
    #elif defined HAVE_NEW_MS_CRT
        // CRT from VS 2005 or higher performs proper conversion from the locale CP
        // to the console output CP, but you still have to make sure the character
        // can be represented in both CP's
    
        // use CRT, Unicode -> Polish CP -> Console CP
        setlocale(LC_ALL, "Polish");
        wprintf(L"Unicode->PolishCP->ConsoleCP(%u): %c%c  ", 
                con_cp, A_OGONEK, Z_DOT_ABOVE);
        wcout << L"wcout: " << A_OGONEK << Z_DOT_ABOVE << endl;
    
        // use CRT, Unicode -> Polish CP -> Console CP(852)
        if (con_cp != 852)
        {
            SetConsoleCP(852);
            wprintf(L"Unicode->PolishCP->ConsoleCP(852): %c%c  ", 
                    A_OGONEK, Z_DOT_ABOVE);
            wcout << L"wcout: " << A_OGONEK << Z_DOT_ABOVE << endl;
        }//if
    
        // use CRT, direct Unicode, no conversions
        _setmode(_fileno(stdout), _O_U16TEXT);
        wprintf(L"Unicode(CRT): %c%c  ", A_OGONEK, Z_DOT_ABOVE);
        wcout << L"wcout: " << A_OGONEK << Z_DOT_ABOVE << endl;
    #endif
    
        // use Win32 API, direct Unicode, no conversions
        wchar_t buff[128];
        int len = wsprintfW(buff, L"Unicode(Win32API): %c%c\n", 
                            A_OGONEK, Z_DOT_ABOVE);
        DWORD written;
        WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), 
                      buff, len, &written, 0);
    
        // restore original console CP's
        SetConsoleCP(con_cp);
        SetConsoleOutputCP(con_out_cp);
        return 0;
    }//main
    You will need to ensure that your console (cmd.exe) is using a font with Unicode support - like Lucida Console.

    gg

  5. #5
    Join Date
    Sep 2010
    Posts
    18

    Re: [C++] East European Fonts - Windows console

    Big thanks for your help, it's the solution I've looked for.

    In the meantime I've found the SetConsoleOutputCP() function and had some fun with it, and... there is a thing that I don't understand:
    Code:
    #include <iostream>
    #include <string>
    #include <windows.h>
    using namespace std;
    
    
    int main(){
    SetConsoleOutputCP(65001);
    cout << "zażółć gęślą jaźń"<<endl;
    cout <<"End"<<endl;//it's skipped somehow
    
    int i;
    cin>>i;cin>>i;
    return 0;
    }
    This program writes out polish fonts correctly, but after "zażółć gęślą jaźń" it suddenly stops writing out any output. It's like word "End" is redirected somewhere, instead of: write one thing, write second thing, get value, get value, it does: write one thing, get value, get value.

    Could you explain me, where the error is?

  6. #6
    Join Date
    Nov 2003
    Posts
    1,902

    Re: [C++] East European Fonts - Windows console

    CP_UTF8 is not a real codepage. It's only meant for WideCharToMultiByte/MultiByteToWideChar.

    The MS CRT doesn't support UTF8 as a locale codepage either - so UTF8 output to "stdout" just doesn't work.

    Additional info on the subject: http://www.codeguru.com/forum/showpo...98&postcount=2

    >> it suddenly stops writing out any output.
    cout probably has bad/fail bit set.

    gg
    Last edited by Codeplug; November 5th, 2010 at 01:39 PM.

  7. #7
    Join Date
    Sep 2010
    Posts
    18

    Re: [C++] East European Fonts - Windows console

    Thanks! When I cleared the bit, the output went out correctly.

    My program is for many people to use, and I'm not sure, if they will be using XP, Vista or Win7, I'd like to make it most universal as possible.

    My first idea is to make a "myWrite" function in my "logHandler" class, which would go like:
    Code:
    void myWrite(string s){
    cout <<s<<endl;
    cout.clear();
    }
    But it's not good solution, isn't it?

    On the second hand, I can use solution you suggested - Win32 API, but it's more work for me, to convert std::string (which I use in my program) to WCHAR *.

    Other thing is to make a char by char lookup on my output string, and switching their codes, depending on the codepage selected. Most work, I guess.

    Which one should I choose?

  8. #8
    Join Date
    Nov 2003
    Posts
    1,902

    Re: [C++] East European Fonts - Windows console

    Avoid using any codepage conversions at all and use Unicode directly.

    The easiest way to do that is to use the MS CRT from VS2005 or later, with "_setmode(_fileno(stdout), _O_U16TEXT)", std::wcout, std::wstring, etc...

    If you absolutely must have extended characters in your source code, then only use them in wide-string literals and save your source as UTF8-with-BOM or UTF16-with-BOM. Just keep in mind that a "dumb" editor will corrupt your source if it doesn't understand the file encoding. You'll also need a compiler that supports source code in that encoding (VS2005 or later for MS compilers).

    gg

  9. #9
    Join Date
    Sep 2010
    Posts
    18

    Re: [C++] East European Fonts - Windows console

    If you absolutely must have extended characters in your source code
    No. There is no need to be any unicode characters in my source code, there must be some in system messages though. The overview of all what I need to do is:
    - Compile source code of my own script lanugage to compiled source files.
    - Load these files into program and (when needed) put them onto the screen.
    In the meantime there will appear some system messages, invalid command handlers and such, which can be loaded from exterm configuration file, I don't have to hardcode them in my program source.

    I'll try what you have suggested, I hope it won't be lots of work

  10. #10
    Join Date
    Nov 2003
    Posts
    1,902

  11. #11
    Join Date
    Sep 2010
    Posts
    18

    Re: [C++] East European Fonts - Windows console

    Big, big thanks. If you happen to visit Poland, I owe you a pint.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured