Click to See Complete Forum and Search --> : c++ - parsing windows textfile: how to strip extra characters?


BabaG
May 19th, 2008, 03:31 PM
i've run into a problem reading a windows-generated textfile
onto my linux (mandriva 2007.1) system using c++. it took me
a long time and lots of help from the good folks here, but
i've finally figured out that the issue i've run into seems
to be one of extra, hidden characters in the original text file.

i started out by processing one variable read from the textfile
and had a lot of problems. i finally got around them by using
substr to parse only the first three characters of the line
read into my variable. that made things work.

my thinking is, however, that the likelihood is that every line
in the text file probably has this same issue. that would argue
in favor of addressing the issue, not at the individual variable
level, but at the file level. in other words, when the text file
is first parsed into my script. either that or by somehow
processing the textfile before it is read.

so, there i have two ideas to pursue: preprocessing the text
file, or processing it as it is read.

i'm using vector to read the text file. how would i strip extra
characters at that stage?

alternately, how would i strip the extra characters before the
text file comes into the script?

the program is below.

thanks,
BabaG

#include <fstream>
#include <iostream>
#include <iomanip>
#include <string>
#include <vector>
#include <assert.h>

using namespace std;

int main()
{
int count = 0

ifstream infile("file_to_be_parsed.txt");

if (!infile)
{
cerr << "Could not open file." << endl;

return 1;
}

vector<string> ScriptVariables;
string line;

while (getline(infile, line))
{
ScriptVariables.push_back(line);
}

infile.close();

// lots of variables assigned from text file
// this is the one that's been a problem in another thread

string capformat = ScriptVariables[8];

// perform operations

int cr2W = 4368;
int cr2H = 2912;

int nefW = 3872;
int nefH = 2592;

double CtrX = 0;
double CtrY = 0;

string capformatTrimmed = capformat.substr(0,3);

if (capformatTrimmed == "cr2")
{
double CtrX = cr2W/2.0;
double CtrY = cr2H/2.0;
}
else if (capformatTrimmed == "nef")
{
double CtrX = nefW/2.0;
double CtrY = nefH/2.0;
}
else
{
cout << "something is wrong with cr2/nef line." << endl;
}

cout << CtrX << endl;
cout << CtrY << endl;

return 0;
}

Lindley
May 19th, 2008, 04:13 PM
Windows uses \r\n for line endings; Unix uses just \n. This is a well-known problem, and the reason why I try never to open files in text mode on Windows----I'd prefer that it just write what I tell it to, and not try to insert extra \r characters all over the place.

If you only have one word per line, the simplest thing would be to use operator>> rather than getline. Otherwise, you'll have to do the whitespace stripping yourself.

Duoas
May 19th, 2008, 04:53 PM
In my experience, cross-platform programs that cannot handle the difference between Win and Unix line endings (at bare minimum) always break when least desired.

It gets me at home often enough that I wrote myself a little utility years ago that I'm still using that does the same as dos2unix, so my dumb windows programs can handle unix text files.

You might just want to get lines with this little example
http://www.codeguru.com/forum/showpost.php?p=1720870&postcount=11

It handles LF (unix), CRLF (windows), and CR (mac). Enjoy!

kempofighter
May 19th, 2008, 05:20 PM
In my experience, cross-platform programs that cannot handle the difference between Win and Unix line endings (at bare minimum) always break when least desired.

It gets me at home often enough that I wrote myself a little utility years ago that I'm still using that does the same as dos2unix, so my dumb windows programs can handle unix text files.

You might just want to get lines with this little example
http://www.codeguru.com/forum/showpost.php?p=1720870&postcount=11

It handles LF (unix), CRLF (windows), and CR (mac). Enjoy!

I find it hard to believe that you are trying to convince us that your switch example is not spaghetti code! :eek:

kempofighter
May 19th, 2008, 05:49 PM
When you store the text file on the linux computer there should be a built in shell command to convert it for you.

Are you doing this as an exercise or do you need to actually worry about making the program portable enough to read both types of files? On windows, the getline operation gives you a null terminated string without the \r\n so I have no clue what getline would do on a linux system if it encountered a \r\n. You'd think it would just see the \r as a substring and just read it since it doesn't have any special meaning to a linux computer. If it were me, I'd step into the code with a debugger and see what string is in memory after a getline operation. If the "\r" is there you should be able to use find and erase to get rid of it after each getline. I don't know linux so if I really wanted to write a program to do this, I'd just try it and see what happens in the debugger first. If it were me, I would convert the file from dos to unix using a shell command before running my C++ program.

Duoas
May 19th, 2008, 05:56 PM
No you twit, that's elegance.

"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."

If you think it is spaghetti code then you've never actually seen spaghetti code, nor do you have an appreciation for its actual drawbacks and lack of structure.

I never said you had to like it, and I've made no attempt to force anyone to use it. Examples of useful goto are so few and far-between it just so happened that I had it handy.

If you don't like it, don't read my posts. But keep your religious bigotry out of other people's threads. Either offer a better solution, or shut up.

[edit] It occurs to me that you weren't being a jerk, but just joshing me. If you were, I'm sorry. I guess I'm a bit defensive... :-S