|
-
December 12th, 2009, 08:02 PM
#46
Re: best way to deal with big array or vector?
 Originally Posted by VladimirF
So are you OK with these results? (It's still taking over 3 minutes!)
Is that a kind of processing you do often? I assume that if it was a one time deal, you'd be OK with original 11 minutes.
Text file are usually very zippable, could you try to zip your 1GB file and see how big it is? Any way you could share it?
I also assume that your lates code is in post # 31 above.
No, I do not have to do that processing often. I was actually pleased because I have learned a lot.
The text is around 200MB when in tar.bz2. How could I share it?
And yes, my latest code is in post 31.
Anyway, I need to move on. It seems to me that map is two component container. How could I get a three component container similar like that, with the first component is the first column of input file, second is second, and third is third? The reason I ask because now I want to write output like this
Code:
a1 12 1
a1 13 1
a1 14 1
a1 15 1
a1 16 1
a1 17 1
a1 20 1
a1 21 2
a1 22 2
a1 23 2
a1 24 2
a1 25 2
-
December 12th, 2009, 11:20 PM
#47
Re: best way to deal with big array or vector?
 Originally Posted by dukevn
How could I get a three component container similar like that, with the first component is the first column of input file, second is second, and third is third?
When you have data that is related to each other in some way, you shouldn't think of creating separate arrays for each piece of data.
For example, a student record in a school may contain a name, address, courses taken, and a whole lot of other things related to the student. What if there are 20 separate items related to one student? Would you create 20 arrays? You'll go crazy trying to maintain 20 separate arrays -- for example, if you want to sort on the name, you have to sort all the other arrays in parallel to keep the data together.
What you should do is create a struct that contains the individual items. Then create an array of those entries.
Code:
struct Whatever
{
std::string c1;
std::string c2;
std::string c3;
};
Assuming that c1, c2, and c3 are your items, you create a vector, deque, linked list, etc. of these. All the data for a single line is together, and you just create an array of them.
Regards,
Paul McKenzie
-
December 13th, 2009, 12:30 PM
#48
Re: best way to deal with big array or vector?
 Originally Posted by Paul McKenzie
What you should do is create a struct that contains the individual items. Then create an array of those entries.
Thansk Paul for your suggestion. Here is the code I came up with
Code:
#include <map>
#include <vector>
#include <stdio.h>
#include <stdlib.h>
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <time.h>
using namespace std;
struct KeyStruct {
string Name;
int x;
KeyStruct(const string name,const int xval) :
Name(name),x(xval) {}
};
struct KeyCmp {
bool operator()( const KeyStruct & k1, const KeyStruct & k2 ) {
if ( k1.Name < k2.Name ) {
return true;
} else if ( k1.Name == k2.Name && k1.x < k2.x ) {
return true;
} else {
return false;
}
}
};
typedef map<KeyStruct, int, KeyCmp> mapStruct;
int main( int argc, char *argv[] ) {
if ( argc == 1 ) {
fprintf(stderr, "Usage: xyTest <input> <output>\n");
return 1;
}
mapStruct mapTest;
ifstream fin(argv[1]);
string tempName;
int x1, x2;
while ( fin >> tempName >> x1 >> x2 ) {
string temp;
getline(fin,temp);
for ( int i = x1; i <= x2; ++i ) {
KeyStruct mapTemp(tempName,i);
++mapTest[mapTemp];
}
}
fin.close();
ofstream fout(argv[2]);
for ( mapStruct::iterator i = mapTest.begin(), end = mapTest.end(); i != end; ++i ) {
fout << (i->first).Name << '\t' << (i->first).x << '\t' << i->second << '\n';
}
fout.close();
return 0;
}
It works fine, but any other comment/suggestion to shorten/improve the code is welcome.
Now I am working on how to write the summary, not all the points like in the beginning. Starting from 0, the summary should report different intervals with different counts. For example, with input file like this
Code:
a1 12 17
a1 20 25
a1 21 26
a2 13 18
a2 35 40
a1 16 21
a3 51 56
a3 65 70
the output should look like
Code:
a1 0 11 0
a1 12 15 1
a1 16 17 2
a1 18 19 1
a1 20 20 2
a1 21 21 3
a1 22 25 2
a1 26 26 1
a2 0 12 0
a2 13 18 1
a2 19 34 0
a2 35 40 1
...
Any ideas?
Thanks,
D.
-
December 13th, 2009, 12:39 PM
#49
Re: best way to deal with big array or vector?
 Originally Posted by dukevn
Are you saying that splitting input file to 8 chunks, processing those 8 chunks in parallel will not help at all?
I can't be sure, but I wouldn't expect it to help very much. Why don't you try and find out?
-
December 13th, 2009, 12:46 PM
#50
Re: best way to deal with big array or vector?
 Originally Posted by Lindley
I can't be sure, but I wouldn't expect it to help very much. Why don't you try and find out?
I will, since it is in my learning plan, but not just now because I have a feeling that it will involve more reading and time.
-
December 13th, 2009, 10:15 PM
#51
Re: best way to deal with big array or vector?
I suspect that accessing the same map from multiple threads will not improve the performance.
Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
Convenience and productivity tools for Microsoft Visual Studio:
FeinWindows - replacement windows manager for Visual Studio, and more...
-
December 13th, 2009, 10:58 PM
#52
Re: best way to deal with big array or vector?
If you are able to change the creation of the input file ....
use unformatted I/O rather than text I/O. That should give
a significant performance improvment.
-
December 14th, 2009, 01:54 PM
#53
Re: best way to deal with big array or vector?
 Originally Posted by Philip Nicoletti
If you are able to change the creation of the input file ....
use unformatted I/O rather than text I/O. That should give
a significant performance improvment.
Unfortunately no, the input's format is fixed.
-
December 14th, 2009, 02:03 PM
#54
Re: best way to deal with big array or vector?
 Originally Posted by dukevn
Now I am working on how to write the summary, not all the points like in the beginning. Starting from 0, the summary should report different intervals with different counts. For example, with input file like this
Code:
a1 12 17
a1 20 25
a1 21 26
a2 13 18
a2 35 40
a1 16 21
a3 51 56
a3 65 70
the output should look like
Code:
a1 0 11 0
a1 12 15 1
a1 16 17 2
a1 18 19 1
a1 20 20 2
a1 21 21 3
a1 22 25 2
a1 26 26 1
a2 0 12 0
a2 13 18 1
a2 19 34 0
a2 35 40 1
...
In order to archive something like above, I have to mix up a lot of if and else if when reading each of the map array. Here is the final code
Code:
#include <map>
#include <vector>
#include <stdio.h>
#include <stdlib.h>
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
struct Point {
string Name;
int xPos;
Point(const string A,const int B) :
Name(A),xPos(B) {}
};
struct PointCmp {
bool operator()( const Point & b1, const Point & b2 ) {
if ( b1.Name < b2.Name || ( b1.Name == b2.Name && b1.xPos < b2.xPos ) ) {
return true;
} else {
return false;
}
}
};
typedef map<Point, int, PointCmp> mapPoint;
int main (int argc, char * const argv[]) {
if ( argc == 1 ) {
fprintf(stderr, "Usage: xyTest <input> <output>\n");
return 1;
}
mapPoint mPoint;
ifstream fin(argv[1]);
string tempName;
int x1, x2;
while ( fin >> tempName >> x1 >> x2 ) {
string temp;
getline(fin,temp);
for ( int i = x1; i <= x2; ++i ) {
Point A(tempName,i);
++mPoint[A];
}
}
fin.close();
int tempCount;
int tempPos;
ofstream fout(argv[2]);
for ( mapPoint::iterator i = mPoint.begin(), end = mPoint.end(); i != end; ++i ) {
if ( i == mPoint.begin() ) {
tempCount = i->second;
tempName = (i->first).Name;
tempPos = (i->first).xPos;
fout << tempName << '\t' << 0 << '\t' << tempPos - 1 << '\t' << 0 << '\n';
fout << tempName << '\t' << tempPos << '\t';
} else if ( i->second == tempCount && tempName == (i->first).Name && (i->first).xPos == tempPos + 1 ) {
++tempPos;
if ( i == --mPoint.end() ) {
fout << tempPos << '\t' << tempCount << '\n';
}
} else if ( i->second == tempCount && (tempName == (i->first).Name) ) {
fout << tempPos << '\t' << tempCount << '\n';
fout << tempName << '\t' << tempPos + 1 << '\t' << (i->first).xPos - 1 << '\t' << 0 << '\n';
tempPos = (i->first).xPos;
fout << tempName << '\t' << tempPos << '\t';
} else if ( tempName == (i->first).Name && ((i->first).xPos == tempPos + 1) ) {
fout << tempPos << '\t' << tempCount << '\n';
++tempPos;
fout << tempName << '\t' << (i->first).xPos << '\t';
tempCount = i->second;
} else if ( tempName == (i->first).Name ) {
fout << tempPos << '\t' << tempCount << '\n';
fout << tempName << '\t' << tempPos + 1 << '\t' << (i->first).xPos - 1 << '\t' << 0 << '\n';
fout << tempName << (i->first).xPos << '\t';
} else {
fout << tempPos << '\t' << tempCount << '\n';
tempCount = i->second;
tempName = (i->first).Name;
tempPos = (i->first).xPos;
fout << tempName << '\t' << 0 << '\t' << tempPos - 1 << '\t' << 0 << '\n';
fout << tempName << '\t' << tempPos << '\t';
}
}
fout.close();
return 0;
}
It looks like a real mess inside the for loop, but I know nothing other than if/else if to do that. Any comment to shorten it or to make it look nicer are welcome.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|