-
September 11th, 2009, 02:08 AM
#1
processing ^M characters
Hi All,
I have a text file created in Linux using fwrite().
I want to process this file using a C++ program to look for some certain word patterns.
However, using std::getline(), I noticed that there are ^M characters at the end of each line of it.
I read that ^M characters are inserted to a file in some certain OSs.
When I try to match a word, say "word", which actually ends with a ^M character, in the C++ program, as expected, it does not match it against what is read from the file:
Code:
istream& in = GetStream(); //Get the input stream
string line;
std::getline(in, line); //Read a line, which is actually "word^M"
StrTokenizer tokens = tokenize(line);
StrTokenizer::iterator it = tokens.begin();
if(*it == "word") //As expected, this comparison fails
cout << "found the pattern" << endl;
if(*it == "word^M"), if(*it == "word\n") and if(*it == "word\r\n") also fail.
What statement should I use in this if structure to succeed matching against "word^M" ?
Thanks.
-
September 11th, 2009, 02:36 AM
#2
Re: processing ^M characters
Originally Posted by aryan1
Hi All,
I have a text file created in Linux using fwrite().
I want to process this file using a C++ program to look for some certain word patterns.
However, using std::getline(), I noticed that there are ^M characters at the end of each line of it.
I read that ^M characters are inserted to a file in some certain OSs.
When I try to match a word, say "word", which actually ends with a ^M character, in the C++ program, as expected, it does not match it against what is read from the file:
Code:
istream& in = GetStream(); //Get the input stream
string line;
std::getline(in, line); //Read a line, which is actually "word^M"
StrTokenizer tokens = tokenize(line);
StrTokenizer::iterator it = tokens.begin();
if(*it == "word") //As expected, this comparison fails
cout << "found the pattern" << endl;
if(*it == "word^M"), if(*it == "word\n") and if(*it == "word\r\n") also fail.
What statement should I use in this if structure to succeed matching against "word^M" ?
Thanks.
In my opinion, there is a clean solution, and an easy solution.
The clean one would be to use locales, to read the string successfully. Good luck with that though, locales are not terribly portable.
The second solution consists of realizing that the character you are reading is not actually "^M", but rather the http://en.wikipedia.org/wiki/ASCII#A...rol_characters \r, which is printed on screen as ^M
I noticed you tried all combinations of "Word\r\n", EXCEPT for "Word\r". I'm pretty sure that one would work.
I think you are reading "Word\r\n" however, remember that getline reads everything, and then chops off the final \n, leaving you with only "Word\r".
-
September 11th, 2009, 08:31 AM
#3
Re: processing ^M characters
The other option is to run the input file through dos2unix before you try to use it.
-
September 11th, 2009, 12:08 PM
#4
Re: processing ^M characters
but if the file has been created under Linux and the program reading it also runs under Linux, then there should be no problem as the linefeeds should be consistent. My first thought was that the file was created on a different system.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|