CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 7 of 7
  1. #1
    Join Date
    Jun 2004
    Posts
    170

    Question Unicode and html pages

    Hello.

    I have to read and parse web page which is in UTF-8(or i think that it is). Page contains some arabic texts.

    I use this class http://www.codeproject.com/internet/amhttputils.asp to read web page. It wooks good.

    But in watch window i see only strange symbols,instead of arabic words.

    I tried to use MultiByteToWideChar(CP_UTF8..., after that all arabic words become ???? .

    What i'm doing wrong?Any ideas?

    P.S. I compile with _UNICODE.

  2. #2
    Join Date
    Jun 2002
    Location
    Sweden
    Posts
    467

    Re: Unicode and html pages

    the <head> tag should have the encoding type in it.
    Use a browser to fetch the page and chekc what kind of charset it uses.
    "The making of software, like the making of sausages, should never be watched."

    http://blog.gauffin.org - .NET Coding/Architecture

  3. #3
    Join Date
    Jun 2004
    Posts
    170

    Re: Unicode and html pages

    There are no charset in <head>. But IE determinate page as UTF_8, and if I save page on disk,IE save it in UTF_8.

  4. #4
    Join Date
    Jun 2002
    Location
    Sweden
    Posts
    467

    Re: Unicode and html pages

    What is the "Watch Window"?
    "The making of software, like the making of sausages, should never be watched."

    http://blog.gauffin.org - .NET Coding/Architecture

  5. #5
    Join Date
    Jun 2004
    Posts
    170

    Re: Unicode and html pages

    In VS 6.0 it's the window where you can see values of variables.

  6. #6
    Join Date
    Jun 2002
    Location
    Sweden
    Posts
    467

    Re: Unicode and html pages

    Ok. That window is probably just ansi.
    Please check the webpage in a real browser or wordpad.
    "The making of software, like the making of sausages, should never be watched."

    http://blog.gauffin.org - .NET Coding/Architecture

  7. #7
    Join Date
    Jun 2004
    Posts
    170

    Re: Unicode and html pages

    I think,you don't understand what I ask.

    If i watch this page in IE,then it's look pretty good,and all arabic words are well displayed.

    But when i try to read this page in my program I get bad representation of arabic words.

    I know exactly,that watch window of VS can show me arabic words, if I read page correct or convert it in a right way.

    So how to read or convert correctly?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured