how to read a wide charcter one by one from a *.txt file in c++
i am working to make a translating software from an Urdu sentence into Hindi and vice versa, i am using visual c++ 2010 software with c++ language. i have written an Urdu sentence in a text file. now i want to get a single character one by one from that file so that i can work on it to convert it into its equivalent Hindi character. when i use get() function to get a single character from input file and write this single character on output file, i get some unknown ugly looking character placed in output file. kindly help me with proper code. my code is as follows
Code:
#include<iostream>
#include<fstream>
#include<cwchar>
#include<cstdlib>
using namespace std;
void main()
{
wchar_t arry[50];
wifstream inputfile("input.dat",ios::in);
wofstream outputfile("output.dat");
if(!inputfile)
{
cerr<<"File not open"<<endl;
exit(1);
}
int i=0;
while ( ! inputfile.eof() ) // i am using this while just to
// make sure copy-paste operation of
// written urdu text from one file to
// another. get() function used
// in this way works well but when i
// try to pick only one character from
// file, it does not work.
{
arry[i] = inputfile.get();
outputfile<<arry[i];
i++;
}
inputfile.close();
outputfile.close();
cout<<"Hello world"<<endl;
}
Re: how to read a wide charcter one by one from a *.txt file in c++
Originally Posted by saqibmaqbool
but characters are still not being displayed in output file (blank file remains blank,reply please
Forget about the file for a moment. What is the value of array[i]? Is it the correct value? If so, then the next questions should concern the file, otherwise please confirm what the value of array[i] is.
Re: how to read a wide charcter one by one from a *.txt file in c++
i have attatched the input and output file so kindly have a look on that. it is strange that when i do not write new line character or "endl" after outputfile<<arry[i]; then the input and output files are almost same
but when i write as outputfile<<arry[i]<<endl; to get each urdu character separately then it gives some ugly looking symbols in file. i have also corrected the while condition but problem is still present there
Re: how to read a wide charcter one by one from a *.txt file in c++
Originally Posted by saqibmaqbool
NO, values of array[i] are not correct.
So you're saying the problem is with the get() function??
these values must be unicode (utf-8) values. but these values are some rough digits
What are "rough digits"?
This shouldn't be hard -- open your input file in a hex editor (not a text editor, a hex editor). Look at the values there. Now debug your program, and look at what get() returns. Does it match the hex values you see in the file? It doesn't matter what language you're using, a file consists merely of bytes.
Re: how to read a wide charcter one by one from a *.txt file in c++
First, why do you need to read character by character? That is very slow. Read in an entire line and process the characters as you see fit.
The following code works correctly:
Code:
#include<iostream>
#include<fstream>
#include<cwchar>
#include<cstdlib>
#include <string>
using namespace std;
int main()
{
wifstream inputfile("input.dat",ios::in);
wofstream outputfile("output.dat");
wstring winput;
if(!inputfile)
{
cerr<<"File not open"<<endl;
exit(1);
}
while ( ! inputfile.eof() )
{
getline(inputfile, winput);
outputfile << winput;
// you can do anything you want now with winput[0], winput[1], etc.
//...
}
inputfile.close();
outputfile.close();
}
Re: how to read a wide charcter one by one from a *.txt file in c++
Originally Posted by saqibmaqbool
it is strange that when i do not write new line character or "endl" after outputfile<<arry[i]; then the input and output files are almost same
but when i write as outputfile<<arry[i]<<endl; to get each urdu character separately then it gives some ugly looking symbols in file. i have also corrected the while condition but problem is still present there
The array is just a sequence of values; these are not unicode characters. When a single unicode character is represented by more that one subsequent wchar_t and you split it up in individual values, then of course you get rubbish. It seems that your problem is that you don't understand how unicode works; it's not about files or streams.
Cheers, D Drmmr
Please put [code][/code] tags around your code to preserve indentation and make it more readable.
As long as man ascribes to himself what is merely a posibility, he will not work for the attainment of it. - P. D. Ouspensky
* The Best Reasons to Target Windows 8
Learn some of the best reasons why you should seriously consider bringing your Android mobile development expertise to bear on the Windows 8 platform.