Re: best way to deal with big array or vector?
Quote:
Originally Posted by
VladimirF
So are you OK with these results? (It's still taking over 3 minutes!)
Is that a kind of processing you do often? I assume that if it was a one time deal, you'd be OK with original 11 minutes.
Text file are usually very zippable, could you try to zip your 1GB file and see how big it is? Any way you could share it?
I also assume that your lates code is in post # 31 above.
No, I do not have to do that processing often. I was actually pleased because I have learned a lot.
The text is around 200MB when in tar.bz2. How could I share it?
And yes, my latest code is in post 31.
Anyway, I need to move on. It seems to me that map is two component container. How could I get a three component container similar like that, with the first component is the first column of input file, second is second, and third is third? The reason I ask because now I want to write output like this
Code:
a1 12 1
a1 13 1
a1 14 1
a1 15 1
a1 16 1
a1 17 1
a1 20 1
a1 21 2
a1 22 2
a1 23 2
a1 24 2
a1 25 2
Re: best way to deal with big array or vector?
Quote:
Originally Posted by
dukevn
How could I get a three component container similar like that, with the first component is the first column of input file, second is second, and third is third?
When you have data that is related to each other in some way, you shouldn't think of creating separate arrays for each piece of data.
For example, a student record in a school may contain a name, address, courses taken, and a whole lot of other things related to the student. What if there are 20 separate items related to one student? Would you create 20 arrays? You'll go crazy trying to maintain 20 separate arrays -- for example, if you want to sort on the name, you have to sort all the other arrays in parallel to keep the data together.
What you should do is create a struct that contains the individual items. Then create an array of those entries.
Code:
struct Whatever
{
std::string c1;
std::string c2;
std::string c3;
};
Assuming that c1, c2, and c3 are your items, you create a vector, deque, linked list, etc. of these. All the data for a single line is together, and you just create an array of them.
Regards,
Paul McKenzie
Re: best way to deal with big array or vector?
Quote:
Originally Posted by
Paul McKenzie
What you should do is create a struct that contains the individual items. Then create an array of those entries.
Thansk Paul for your suggestion. Here is the code I came up with
Code:
#include <map>
#include <vector>
#include <stdio.h>
#include <stdlib.h>
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <time.h>
using namespace std;
struct KeyStruct {
string Name;
int x;
KeyStruct(const string name,const int xval) :
Name(name),x(xval) {}
};
struct KeyCmp {
bool operator()( const KeyStruct & k1, const KeyStruct & k2 ) {
if ( k1.Name < k2.Name ) {
return true;
} else if ( k1.Name == k2.Name && k1.x < k2.x ) {
return true;
} else {
return false;
}
}
};
typedef map<KeyStruct, int, KeyCmp> mapStruct;
int main( int argc, char *argv[] ) {
if ( argc == 1 ) {
fprintf(stderr, "Usage: xyTest <input> <output>\n");
return 1;
}
mapStruct mapTest;
ifstream fin(argv[1]);
string tempName;
int x1, x2;
while ( fin >> tempName >> x1 >> x2 ) {
string temp;
getline(fin,temp);
for ( int i = x1; i <= x2; ++i ) {
KeyStruct mapTemp(tempName,i);
++mapTest[mapTemp];
}
}
fin.close();
ofstream fout(argv[2]);
for ( mapStruct::iterator i = mapTest.begin(), end = mapTest.end(); i != end; ++i ) {
fout << (i->first).Name << '\t' << (i->first).x << '\t' << i->second << '\n';
}
fout.close();
return 0;
}
It works fine, but any other comment/suggestion to shorten/improve the code is welcome.
Now I am working on how to write the summary, not all the points like in the beginning. Starting from 0, the summary should report different intervals with different counts. For example, with input file like this
Code:
a1 12 17
a1 20 25
a1 21 26
a2 13 18
a2 35 40
a1 16 21
a3 51 56
a3 65 70
the output should look like
Code:
a1 0 11 0
a1 12 15 1
a1 16 17 2
a1 18 19 1
a1 20 20 2
a1 21 21 3
a1 22 25 2
a1 26 26 1
a2 0 12 0
a2 13 18 1
a2 19 34 0
a2 35 40 1
...
Any ideas?
Thanks,
D.
Re: best way to deal with big array or vector?
Quote:
Originally Posted by
dukevn
Are you saying that splitting input file to 8 chunks, processing those 8 chunks in parallel will not help at all?
I can't be sure, but I wouldn't expect it to help very much. Why don't you try and find out?
Re: best way to deal with big array or vector?
Quote:
Originally Posted by
Lindley
I can't be sure, but I wouldn't expect it to help very much. Why don't you try and find out?
I will, since it is in my learning plan, but not just now because I have a feeling that it will involve more reading and time.
Re: best way to deal with big array or vector?
I suspect that accessing the same map from multiple threads will not improve the performance.
Re: best way to deal with big array or vector?
If you are able to change the creation of the input file ....
use unformatted I/O rather than text I/O. That should give
a significant performance improvment.
Re: best way to deal with big array or vector?
Quote:
Originally Posted by
Philip Nicoletti
If you are able to change the creation of the input file ....
use unformatted I/O rather than text I/O. That should give
a significant performance improvment.
Unfortunately no, the input's format is fixed.
Re: best way to deal with big array or vector?
Quote:
Originally Posted by
dukevn
Now I am working on how to write the summary, not all the points like in the beginning. Starting from 0, the summary should report different intervals with different counts. For example, with input file like this
Code:
a1 12 17
a1 20 25
a1 21 26
a2 13 18
a2 35 40
a1 16 21
a3 51 56
a3 65 70
the output should look like
Code:
a1 0 11 0
a1 12 15 1
a1 16 17 2
a1 18 19 1
a1 20 20 2
a1 21 21 3
a1 22 25 2
a1 26 26 1
a2 0 12 0
a2 13 18 1
a2 19 34 0
a2 35 40 1
...
In order to archive something like above, I have to mix up a lot of if and else if when reading each of the map array. Here is the final code
Code:
#include <map>
#include <vector>
#include <stdio.h>
#include <stdlib.h>
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
struct Point {
string Name;
int xPos;
Point(const string A,const int B) :
Name(A),xPos(B) {}
};
struct PointCmp {
bool operator()( const Point & b1, const Point & b2 ) {
if ( b1.Name < b2.Name || ( b1.Name == b2.Name && b1.xPos < b2.xPos ) ) {
return true;
} else {
return false;
}
}
};
typedef map<Point, int, PointCmp> mapPoint;
int main (int argc, char * const argv[]) {
if ( argc == 1 ) {
fprintf(stderr, "Usage: xyTest <input> <output>\n");
return 1;
}
mapPoint mPoint;
ifstream fin(argv[1]);
string tempName;
int x1, x2;
while ( fin >> tempName >> x1 >> x2 ) {
string temp;
getline(fin,temp);
for ( int i = x1; i <= x2; ++i ) {
Point A(tempName,i);
++mPoint[A];
}
}
fin.close();
int tempCount;
int tempPos;
ofstream fout(argv[2]);
for ( mapPoint::iterator i = mPoint.begin(), end = mPoint.end(); i != end; ++i ) {
if ( i == mPoint.begin() ) {
tempCount = i->second;
tempName = (i->first).Name;
tempPos = (i->first).xPos;
fout << tempName << '\t' << 0 << '\t' << tempPos - 1 << '\t' << 0 << '\n';
fout << tempName << '\t' << tempPos << '\t';
} else if ( i->second == tempCount && tempName == (i->first).Name && (i->first).xPos == tempPos + 1 ) {
++tempPos;
if ( i == --mPoint.end() ) {
fout << tempPos << '\t' << tempCount << '\n';
}
} else if ( i->second == tempCount && (tempName == (i->first).Name) ) {
fout << tempPos << '\t' << tempCount << '\n';
fout << tempName << '\t' << tempPos + 1 << '\t' << (i->first).xPos - 1 << '\t' << 0 << '\n';
tempPos = (i->first).xPos;
fout << tempName << '\t' << tempPos << '\t';
} else if ( tempName == (i->first).Name && ((i->first).xPos == tempPos + 1) ) {
fout << tempPos << '\t' << tempCount << '\n';
++tempPos;
fout << tempName << '\t' << (i->first).xPos << '\t';
tempCount = i->second;
} else if ( tempName == (i->first).Name ) {
fout << tempPos << '\t' << tempCount << '\n';
fout << tempName << '\t' << tempPos + 1 << '\t' << (i->first).xPos - 1 << '\t' << 0 << '\n';
fout << tempName << (i->first).xPos << '\t';
} else {
fout << tempPos << '\t' << tempCount << '\n';
tempCount = i->second;
tempName = (i->first).Name;
tempPos = (i->first).xPos;
fout << tempName << '\t' << 0 << '\t' << tempPos - 1 << '\t' << 0 << '\n';
fout << tempName << '\t' << tempPos << '\t';
}
}
fout.close();
return 0;
}
It looks like a real mess inside the for loop, but I know nothing other than if/else if to do that. Any comment to shorten it or to make it look nicer are welcome.