Re: how does getline() know what line it's getting???
Quote:
Originally Posted by
D_Drmmr
There is no need to parse the files twice if you just want to check that they have the same number of lines. laserlight's suggestion can be easily altered to do that.
Code:
bool res1, res2;
while((res1 = getline(read_file1, new_line1)) &&
(res2 = getline(read_file2, new_line2)))
{
// ...
}
if (res1 || res2)
{
// number of lines does not match
}
This won't work due to the short-circuit evaluation of the && operator. Suppose we have files with equal numbers of lines. While we are reading lines in both res1 and res2 will be true. But when we finally run out, the first getline call will return false (setting res1 to false) and the while condition will thus be false. Therefore the second part of the while condition will not be evaluated and res2 will remain true.
The way to do it is make sure both getline calls are called each time round the loop using a "loop and a half":
Code:
bool res1, res2;
do {
res1 = getline(read_file1, new_line1);
res2 = getline(read_file2, new_line2);
if ( !res1 || !res2 ) {
// One of the files has run out. Do whatever needs to be done then break out of the loop
// Code ...
break;
}
// Both files were read, so do the processing...
} while (res1 && res2);
if (res1 || res2)
{
// number of lines does not match, as the while loop quit
// while one of res1, res2 was true, and the other was false
}
Re: how does getline() know what line it's getting???
Quote:
Originally Posted by
D_Drmmr
When you start comparing boolean values with false, it's either time to get some sleep or you've missed the point. ;)
Both of these conditions seem to be true on a regular basis.
Quote:
Originally Posted by
D_Drmmr
The loop runs as long as both bools are true. That means that after the loop, at least one of the bools is false. If both are false, the two files have the same number of lines.
The problem is that I am getting values that are not the same, even when the files have the same number of lines. I think the next post explains this.
Quote:
Originally Posted by
Peter_B
This won't work due to the short-circuit evaluation of the && operator. Suppose we have files with equal numbers of lines. While we are reading lines in both res1 and res2 will be true. But when we finally run out, the first getline call will return false (setting res1 to false) and the while condition will thus be false. Therefore the second part of the while condition will not be evaluated and res2 will remain true.
Thanks, I guess that is why I was getting bool values of 0 and 1 even when the files have the same number of rows and the data processes.
Using the do while you posted,
Code:
bool have_line1; bool have_line2;
do{
have_line1 = getline(read_file1, new_line1);
have_line2 = getline(read_file2, new_line2);
cout << "have_line1= " << have_line1 << endl;
cout << "have_line2= " << have_line2 << endl;
// check to see if both getline calls got a line, exit if not
if(!have_line1 || !have_line2) {
// error, neither of these should be 0 in the loop unless one file is shorter
exit (-1);
}
// process input
} while (have_line1 && have_line2);
I am still getting my error here. The printout of the two bools is,
have_line1= 1
have_line2= 1
have_line1= 1
have_line2= 1
have_line1= 1
have_line2= 1
have_line1= 1
have_line2= 1
have_line1= 1
have_line2= 1
have_line1= 1
have_line2= 1
have_line1= 1
have_line2= 1
have_line1= 1
have_line2= 1
have_line1= 1
have_line2= 1
have_line1= 1
have_line2= 1
have_line1= 0
have_line2= 0
This is what you expect in that the values should both be 1 until the files run out, and then they should both be 0. The issue is that this is printed from inside the loop and I don't see how I could still be in the loop when when both values are 0. Does this do while structure always run through the code one extra loop? Is it right that the evaluation at the end, while (have_line1 && have_line2);, means that you will always run through one last time with both bools = 0?
If I switch to,
Code:
// check to see if both getline calls got a line, exit if not
if(have_line1 != have_line2) {
// error, neither of these should be 0 in the loop unless one file is shorter
}
Then it behaves more like I expect. Did I do something wrong here?
I added some code so that it won't try to write output if both bools = 0, but I'm not sure I'm on the right track.
It looks like I don't need to check the value of the bools after the loop, since I think a mismatch in bool values will trigger the exception before the loop ends. It never hurts to have an extra trap or so, even if you think the condition can never happen.
LMHmedchem
Re: how does getline() know what line it's getting???
Quote:
Originally Posted by
Peter_B
This won't work due to the short-circuit evaluation of the && operator.
You're right. Thanks for spotting my error.
I'd rate, but I have to spread some reputation first.
Re: how does getline() know what line it's getting???
Quote:
Originally Posted by
LMHmedchem
Using the do while you posted,
I am still getting my error here. The printout of the two bools is,
...[DELETED]...
This is what you expect in that the values should both be 1 until the files run out, and then they should both be 0. The issue is that this is printed from inside the loop and I don't see how I could still be in the loop when when both values are 0. Does this do while structure always run through the code one extra loop? Is it right that the evaluation at the end, while (have_line1 && have_line2);, means that you will always run through one last time with both bools = 0?
The while condition is not continually evaluated at every point through the loop. It is only evaluated when execution reaches the while statement at the end of each time around the loop. So in your code these lines:
Code:
cout << "have_line1= " << have_line1 << endl;
cout << "have_line2= " << have_line2 << endl;
will still run even when have_line1 or have_line2 have just been set to false. You should consider these cout lines to be part of the 'process input' region of the do-while loop.
Also, this bit is not what I said:
Code:
if(!have_line1 || !have_line2) {
// error, neither of these should be 0 in the loop unless one file is shorter
exit (-1);
}
It is not an error for one (or both) of have_line1 or have_line2 to be false - it just means that one (or both) of the files have been fully read in. If both are false then both files both been exhausted at the same time, so they are the same length. You should be using 'break' to quit the loop here (as in my example), not 'exit' to stop the entire program.
Quote:
Originally Posted by
LMHmedchem
If I switch to,
Code:
// check to see if both getline calls got a line, exit if not
if(have_line1 != have_line2) {
// error, neither of these should be 0 in the loop unless one file is shorter
}
Then it behaves more like I expect. Did I do something wrong here?
If the files are the same length this condition (have_line1 != have_line2) will never be true. To give the loop a chance to finish, this condition should be checked after the loop has finished, not inside the loop. Given that have_line1 and have_line2 are booleans, there are four possible combinations of values when the loop has finished. They are:- both are true - actually not possible as the loop would still be running
- both are false - so the files were the same length
- have_line1 is true, have_line2 is false
file1 still had a line but file2 didn't, so they are unequal with file1 being longest - have_line1 is false, have_line2 is true
same as previous but file2 is longer than file1
These possibilities are covered by this check after the loop (originally posted by D_Drmmr but with changes to variable names)
Code:
if (have_line1 || have_line2)
{
// number of lines does not match
}
Quote:
Originally Posted by
LMHmedchem
It never hurts to have an extra trap or so, even if you think the condition can never happen.
You shouldn't be adding code to check conditions unless you know how those conditions could exist. Doing so indicates that you haven't studied the possible paths that execution could take through your code. And if execution could never pass through that code there is no way to test it.
Quote:
Originally Posted by
D_Drmmr
You're right. Thanks for spotting my error.
I'd rate, but I have to spread some reputation first.
That's fine - it's the thought that counts :D
Re: how does getline() know what line it's getting???
By the way, there is another neat way to get around the short-circuiting. That is to use the (seldom used) comma operator. This allows you to put several expressions where only one is usually allowed. So in this case:
Code:
bool res1, res2;
while (
(res1 = getline(read_file1, new_line1)), // This line executes first
(res2 = getline(read_file2, new_line2)), // Then this
(res1 && res2) // Finally, this is evaluated as the while condition
)
{
// ...
}
if (res1 || res2)
{
// number of lines does not match
}
It is not as readable though.
Re: how does getline() know what line it's getting???
Wouldn't it be simpler to just use operator & instead?
Re: how does getline() know what line it's getting???
Quote:
Originally Posted by
laserlight
Wouldn't it be simpler to just use operator & instead?
Actually, that would work just fine here. I honestly didn't even consider that as it is a bitwise operator rather than a logical operator. So - nice idea :)
I'll just add a couple of caveats on the use of & to avoid short-circuiting though, as it differs in a couple of important ways from the && operator - though these differences do not matter in the current case.
(@laserlight - you obviously know all this, it is intended for people who don't :))
They are:
- && evaluates to true when the operands are both any non-zero value. So (1 && 2) evaluates to true. However & does a bit-by-bit comparison so only evaluates to true when the operands are both 1 in some bit position. This means (1 & 2) evaluates to false (or, strictly speaking, to 0).
To make & work in the same way as the logical operator we need to cast the operands to bool first. This will convert any non-zero value to true. so ( (bool)1 & (bool)2 ) evaluates to true (or, again strictly speaking, 1). This cast is even needed when the operands are of type BOOL (used as return values in a lot of Windows API functions). And if you forget the cast the compiler will not help you - it will compile just fine, but not work correctly. - & does not define the order of evaluation of it's operands. The compiler is free to decide which to evaluate first in order to best optimize the code. When the evaluation does not have side-effects, or when the side-effects are independent (as in the current case) this does not matter.
The approaches I describe ("loop and a half" and comma operator) both have a well-defined evaluation order, and work for any types where non-zero means true without having to remember to cast to bool. So I think they have a wider applicability.
Re: how does getline() know what line it's getting???
Quote:
Originally Posted by
Peter_B
& does not define the order of evaluation of it's operands. The compiler is free to decide which to evaluate first in order to best optimize the code. When the evaluation does not have side-effects, or when the side-effects are independent (as in the current case) this does not matter.
actually, it's even worse because & does not define a sequence point ( or in c++11 lingo, the evaluations of its arguments are unsequenced, not just undeterminately sequenced ) making a potential read and write access to the same scalar object undefined behavior, which is worse then just an unspecified ordering of evaluations ( eg. the expression (( cout << "1" ) & ( cout << "2" )) can print "12" or "21", but ( c++ & c++ ) can give anything ... );
Re: how does getline() know what line it's getting???
Good point, superbonzo. Some compilers can help point this use out though. g++ has a 'sequence-point' warning (-Wsequence-point on the command line) which doesn't always identify problems but does in this case.
For the following expression:this gives the warning:
Code:
warning: operation on 'c' may be undefined