|
-
December 11th, 2009, 11:42 AM
#16
Re: best way to deal with big array or vector?
All streams are closed when their object is destroyed. Typically this happens when they go out of scope.
-
December 11th, 2009, 11:51 AM
#17
Re: best way to deal with big array or vector?
 Originally Posted by dukevn
Afterward I still have to close the file, or I dont?
It will be closed when the stream object is destroyed when it goes out of scope. Of course, if the end of the scope is not near, it is good to explicitly close it anyway.
-
December 11th, 2009, 12:13 PM
#18
Re: best way to deal with big array or vector?
 Originally Posted by Lindley
All streams are closed when their object is destroyed. Typically this happens when they go out of scope.
 Originally Posted by laserlight
It will be closed when the stream object is destroyed when it goes out of scope. Of course, if the end of the scope is not near, it is good to explicitly close it anyway.
So it is fine if I use flush, but it is good (and concise) if I use close, right? Is flush more efficient than close?
-
December 11th, 2009, 12:19 PM
#19
Re: best way to deal with big array or vector?
 Originally Posted by dukevn
So it is fine if I use flush, but it is good (and concise) if I use close, right? Is flush more efficient than close?
It is up to use. I used flush() in my example because that is what std::endl does in addition to writing a newline, and my intention was to demonstrate that there is no need to flush the stream on each iteration. You just need to flush once at the end, and you may not even need to do that explicitly.
-
December 11th, 2009, 12:59 PM
#20
Re: best way to deal with big array or vector?
 Originally Posted by laserlight
It is up to use. I used flush() in my example because that is what std::endl does in addition to writing a newline, and my intention was to demonstrate that there is no need to flush the stream on each iteration. You just need to flush once at the end, and you may not even need to do that explicitly.
I did not know of flush() before your example , and I did not know that I have to (or should?) flush any stream? But I do remember that I did not have error or warning if I forgot closing a file.
-
December 11th, 2009, 01:01 PM
#21
Re: best way to deal with big array or vector?
 Originally Posted by dukevn
In the real input file, there are other things after the third column. Not sure about your suggestion, but I will try. Thanks.
In that case, you would need to do the following (will work if there are other
things after the third column ... or if there are only 3 columns):
Code:
while ( fin >> tempLine >> x1 >> x2 )
{
getline(fin,tempLine);
for (int i=0;i<=(x2-x1);i++)
{
++mapTest[x1+i];
}
}
-
December 11th, 2009, 01:07 PM
#22
Re: best way to deal with big array or vector?
 Originally Posted by Philip Nicoletti
In that case, you would need to do the following
Good point. It may be more explanatory to use the ignore() member function though.
-
December 11th, 2009, 01:22 PM
#23
Re: best way to deal with big array or vector?
 Originally Posted by Philip Nicoletti
In that case, you would need to do the following (will work if there are other
things after the third column ... or if there are only 3 columns):
Code:
while ( fin >> tempLine >> x1 >> x2 )
{
getline(fin,tempLine);
for (int i=0;i<=(x2-x1);i++)
{
++mapTest[x1+i];
}
}
You got it just right Philip. I am wondering why
Code:
while ( fin >> tempLine >> x1 >> x2 ) {
string temp;
int x1, x2;
if ( fin >> temp >> x1 >> x2 ) {
for ( int i = x1; i <= x2; ++i ) {
++mapTest[i];
}
}
}
neglects the first input line, then you shed a light for me . Testing them now, and I will report back the results.
-
December 11th, 2009, 01:23 PM
#24
Re: best way to deal with big array or vector?
 Originally Posted by laserlight
It may be more explanatory to use the ignore() member function though.
Would you mind giving me more explanation? How do I use ignore()?
-
December 11th, 2009, 01:46 PM
#25
Re: best way to deal with big array or vector?
 Originally Posted by dukevn
Testing them now, and I will report back the results.
OK here is the reports with an input file of 1.06GB on a cluster node of 8 cores:
- Original code: 36m2.169s
- Improved v.1 code: 12m40.918s
- Improved v.2 (without XMax): 11m39.172s
- Final code v.3 (without string stream): 11m35.974s
So there is no much difference between the last three versions (but three times as fast as the original one - a great improvement). One thing I am aiming now is how to make use of the multi-core advantage (right now the code runs only on one core), but it seems to be not that easy.
Thanks for all of your helps.
-
December 11th, 2009, 01:49 PM
#26
Re: best way to deal with big array or vector?
 Originally Posted by dukevn
Would you mind giving me more explanation? How do I use ignore()?
Suppose according to the input format there will be a tab character between the first field and the second field. You could dispense with the temporary string variable that was used to ignore input:
Code:
int x1, x2;
while (fin.ignore(1000, '\t') && (fin >> x1 >> x2))
{
fin.ignore(1000, '\n');
for (int i = x1; i <= x2; ++i)
{
++mapTest[i];
}
}
where 1000 is arbitrarily chosen. You could have used std::numeric_limits<std::streamsize>::max() instead.
-
December 11th, 2009, 01:49 PM
#27
Re: best way to deal with big array or vector?
flush() is something you usually don't need to call yourself. It usually gets handled automatically. But it's available because occasionally you do need to call it explicitly.
Multiple cores are not going to help much for file parsing. Everything bottlenecks through the disk controller anyway. Typically, the biggest multi-core gain comes when you're doing heavy mathematical computations in main memory.
-
December 11th, 2009, 03:56 PM
#28
Re: best way to deal with big array or vector?
 Originally Posted by laserlight
Suppose according to the input format there will be a tab character between the first field and the second field. You could dispense with the temporary string variable that was used to ignore input:
Code:
int x1, x2;
while (fin.ignore(1000, '\t') && (fin >> x1 >> x2))
{
fin.ignore(1000, '\n');
for (int i = x1; i <= x2; ++i)
{
++mapTest[i];
}
}
where 1000 is arbitrarily chosen. You could have used std::numeric_limits<std::streamsize>::max() instead.
Got it. Thanks laserlight.
-
December 11th, 2009, 03:57 PM
#29
Re: best way to deal with big array or vector?
 Originally Posted by Lindley
Multiple cores are not going to help much for file parsing. Everything bottlenecks through the disk controller anyway. Typically, the biggest multi-core gain comes when you're doing heavy mathematical computations in main memory.
Are you saying that splitting input file to 8 chunks, processing those 8 chunks in parallel will not help at all?
-
December 11th, 2009, 05:17 PM
#30
Re: best way to deal with big array or vector?
 Originally Posted by dukevn
OK here is the reports with an input file of 1.06GB on a cluster node of 8 cores:
- Original code: 36m2.169s
- Improved v.1 code: 12m40.918s
- Improved v.2 (without XMax): 11m39.172s
- Final code v.3 (without string stream): 11m35.974s
So there is no much difference between the last three versions (but three times as fast as the original one - a great improvement).
My guess is that code like that should execute at the speed of file I/O.
I know I can copy a 1GB file in about 30 seconds, so 11 minutes sounds like WAY too much.
Could you comment out everything in your code except for I/O and see how long that takes?
Do you mind posting your code and a sample data file?
Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
Convenience and productivity tools for Microsoft Visual Studio:
FeinWindows - replacement windows manager for Visual Studio, and more...
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|