CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 4 of 4
  1. #1
    Join Date
    Jun 2009
    Posts
    118

    Question processing ^M characters

    Hi All,

    I have a text file created in Linux using fwrite().

    I want to process this file using a C++ program to look for some certain word patterns.

    However, using std::getline(), I noticed that there are ^M characters at the end of each line of it.

    I read that ^M characters are inserted to a file in some certain OSs.

    When I try to match a word, say "word", which actually ends with a ^M character, in the C++ program, as expected, it does not match it against what is read from the file:

    Code:
    istream& in = GetStream(); //Get the input stream
    string line;
    
    std::getline(in, line); //Read a line, which is actually "word^M"
    StrTokenizer tokens = tokenize(line);
    StrTokenizer::iterator it = tokens.begin();
    
    if(*it == "word") //As expected, this comparison fails
         cout  << "found the pattern" << endl;
    if(*it == "word^M"), if(*it == "word\n") and if(*it == "word\r\n") also fail.

    What statement should I use in this if structure to succeed matching against "word^M" ?

    Thanks.

  2. #2
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: processing ^M characters

    Quote Originally Posted by aryan1 View Post
    Hi All,

    I have a text file created in Linux using fwrite().

    I want to process this file using a C++ program to look for some certain word patterns.

    However, using std::getline(), I noticed that there are ^M characters at the end of each line of it.

    I read that ^M characters are inserted to a file in some certain OSs.

    When I try to match a word, say "word", which actually ends with a ^M character, in the C++ program, as expected, it does not match it against what is read from the file:

    Code:
    istream& in = GetStream(); //Get the input stream
    string line;
    
    std::getline(in, line); //Read a line, which is actually "word^M"
    StrTokenizer tokens = tokenize(line);
    StrTokenizer::iterator it = tokens.begin();
    
    if(*it == "word") //As expected, this comparison fails
         cout  << "found the pattern" << endl;
    if(*it == "word^M"), if(*it == "word\n") and if(*it == "word\r\n") also fail.

    What statement should I use in this if structure to succeed matching against "word^M" ?

    Thanks.
    In my opinion, there is a clean solution, and an easy solution.

    The clean one would be to use locales, to read the string successfully. Good luck with that though, locales are not terribly portable.

    The second solution consists of realizing that the character you are reading is not actually "^M", but rather the http://en.wikipedia.org/wiki/ASCII#A...rol_characters \r, which is printed on screen as ^M

    I noticed you tried all combinations of "Word\r\n", EXCEPT for "Word\r". I'm pretty sure that one would work.

    I think you are reading "Word\r\n" however, remember that getline reads everything, and then chops off the final \n, leaving you with only "Word\r".

  3. #3
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: processing ^M characters

    The other option is to run the input file through dos2unix before you try to use it.

  4. #4
    Join Date
    May 2001
    Location
    Germany
    Posts
    1,158

    Re: processing ^M characters

    but if the file has been created under Linux and the program reading it also runs under Linux, then there should be no problem as the linefeeds should be consistent. My first thought was that the file was created on a different system.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured