how to read a wide charcter one by one from a *.txt file in c++

**saqibmaqbool** · September 25th, 2012, 05:28 PM

i am working to make a translating software from an Urdu sentence into Hindi and vice versa, i am using visual c++ 2010 software with c++ language. i have written an Urdu sentence in a text file. now i want to get a single character one by one from that file so that i can work on it to convert it into its equivalent Hindi character. when i use get() function to get a single character from input file and write this single character on output file, i get some unknown ugly looking character placed in output file. kindly help me with proper code. my code is as follows

Code:

#include<iostream>
#include<fstream>
#include<cwchar>
#include<cstdlib>
using namespace std;
void main()
{
    wchar_t arry[50];
    wifstream inputfile("input.dat",ios::in);
    wofstream outputfile("output.dat");

    if(!inputfile)
    {
        cerr<<"File not open"<<endl;
        exit(1);
    }
    int i=0;
    while ( ! inputfile.eof() )         // i am using this while just to
                                         // make sure copy-paste operation of
                                         // written urdu text from one file to
                                         // another. get() function used
                                         // in this way works well but when i
                                         // try to pick only one character from
                                         // file, it does not work.
                                         
    {
        arry[i] = inputfile.get();
        outputfile<<arry[i];
        i++;
    }
    inputfile.close();
    outputfile.close();
    cout<<"Hello world"<<endl;

}

**Paul McKenzie** · September 25th, 2012, 07:32 PM

If your file has more than 50 characters, your program corrupts memory.

Code:

wchar_t arry[50];
//...
int i=0;
while ( ! inputfile.eof() )         
{
   arry[i] = inputfile.get();

Once i reaches 50, all bets are off as to what will happen.

Regards,

Paul McKenzie

**saqibmaqbool** · September 25th, 2012, 08:09 PM

l though i have got your point but even when i put condition for "i" as

Code:

while(i<=30)
{
arry[i] = inputfile.get();
outputfile<<arry[i];
i++;
}

but characters are still not being displayed in output file (blank file remains blank,reply please

**Paul McKenzie** · September 25th, 2012, 08:16 PM

Originally Posted by saqibmaqbool

but characters are still not being displayed in output file (blank file remains blank,reply please

Forget about the file for a moment. What is the value of array[i]? Is it the correct value? If so, then the next questions should concern the file, otherwise please confirm what the value of array[i] is.

Regards,

Paul McKenzie

**Paul McKenzie** · September 25th, 2012, 08:24 PM

Also, the correct while() loop would be this:

Code:

while ( ! inputfile.eof() && i < 50 )

Your other while loop() would fail miserably if the file has less than 30 characters.,

Regards,

Paul McKenzie

**saqibmaqbool** · September 25th, 2012, 08:31 PM

NO, values of array[i] are not correct. these values must be unicode (utf-8) values. but these values are some rough digits

**saqibmaqbool** · September 25th, 2012, 08:44 PM

Name: output&input.JPG
Views: 1877
Size: 78.5 KB

i have attatched the input and output file so kindly have a look on that. it is strange that when i do not write new line character or "endl" after outputfile<<arry[i]; then the input and output files are almost same
but when i write as outputfile<<arry[i]<<endl; to get each urdu character separately then it gives some ugly looking symbols in file. i have also corrected the while condition but problem is still present there

**Paul McKenzie** · September 25th, 2012, 08:51 PM

Originally Posted by saqibmaqbool

i have attatched the input and output file

Where are the files? All I see are images.

Regards,

Paul McKenzie

**Paul McKenzie** · September 25th, 2012, 08:55 PM

Originally Posted by saqibmaqbool

NO, values of array[i] are not correct.

So you're saying the problem is with the get() function??

these values must be unicode (utf-8) values. but these values are some rough digits

What are "rough digits"?

This shouldn't be hard -- open your input file in a hex editor (not a text editor, a hex editor). Look at the values there. Now debug your program, and look at what get() returns. Does it match the hex values you see in the file? It doesn't matter what language you're using, a file consists merely of bytes.

Regards,

Paul McKenzie

**saqibmaqbool** · September 25th, 2012, 08:59 PM

Desktop.rar

these are the files,

thanks for your patience and sorry for late sending you files as net is having problem

**saqibmaqbool** · September 25th, 2012, 09:04 PM

kindly check the zipped files with name: Desktop.rar appearing just above the image and kindly let me know some solution

**Paul McKenzie** · September 25th, 2012, 09:19 PM

First, why do you need to read character by character? That is very slow. Read in an entire line and process the characters as you see fit.

The following code works correctly:

Code:

#include<iostream>
#include<fstream>
#include<cwchar>
#include<cstdlib>
#include <string>

using namespace std;
int main()
{
    wifstream inputfile("input.dat",ios::in);
    wofstream outputfile("output.dat");
    wstring winput;            
    if(!inputfile)
    {
        cerr<<"File not open"<<endl;
        exit(1);
    }

    while ( ! inputfile.eof() ) 
    {
        getline(inputfile, winput);
        outputfile << winput;
        // you can do anything you want now with winput[0], winput[1], etc.
       //...
    }
    inputfile.close();
    outputfile.close();
}

Regards,

Paul McKenzie

**D_Drmmr** · September 26th, 2012, 04:30 AM

Originally Posted by saqibmaqbool

it is strange that when i do not write new line character or "endl" after outputfile<<arry[i]; then the input and output files are almost same
but when i write as outputfile<<arry[i]<<endl; to get each urdu character separately then it gives some ugly looking symbols in file. i have also corrected the while condition but problem is still present there

The array is just a sequence of values; these are not unicode characters. When a single unicode character is represented by more that one subsequent wchar_t and you split it up in individual values, then of course you get rubbish. It seems that your problem is that you don't understand how unicode works; it's not about files or streams.

**saqibmaqbool** · September 26th, 2012, 09:10 AM

thanks a lot sir, i am working on it...

Thread: how to read a wide charcter one by one from a *.txt file in c++

Thread Tools

Display

how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Re: how to read a wide charcter one by one from a *.txt file in c++

Posting Permissions