CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 4 of 4 FirstFirst 1234
Results 46 to 54 of 54
  1. #46
    Join Date
    Dec 2009
    Posts
    89

    Re: best way to deal with big array or vector?

    Quote Originally Posted by VladimirF View Post
    So are you OK with these results? (It's still taking over 3 minutes!)
    Is that a kind of processing you do often? I assume that if it was a one time deal, you'd be OK with original 11 minutes.
    Text file are usually very zippable, could you try to zip your 1GB file and see how big it is? Any way you could share it?
    I also assume that your lates code is in post # 31 above.
    No, I do not have to do that processing often. I was actually pleased because I have learned a lot.

    The text is around 200MB when in tar.bz2. How could I share it?

    And yes, my latest code is in post 31.

    Anyway, I need to move on. It seems to me that map is two component container. How could I get a three component container similar like that, with the first component is the first column of input file, second is second, and third is third? The reason I ask because now I want to write output like this
    Code:
    a1     12	1
    a1     13	1
    a1     14	1
    a1     15	1
    a1     16	1
    a1     17	1
    a1     20	1
    a1     21	2
    a1     22	2
    a1     23	2
    a1     24	2
    a1     25	2

  2. #47
    Join Date
    Apr 1999
    Posts
    27,449

    Re: best way to deal with big array or vector?

    Quote Originally Posted by dukevn View Post
    How could I get a three component container similar like that, with the first component is the first column of input file, second is second, and third is third?
    When you have data that is related to each other in some way, you shouldn't think of creating separate arrays for each piece of data.

    For example, a student record in a school may contain a name, address, courses taken, and a whole lot of other things related to the student. What if there are 20 separate items related to one student? Would you create 20 arrays? You'll go crazy trying to maintain 20 separate arrays -- for example, if you want to sort on the name, you have to sort all the other arrays in parallel to keep the data together.

    What you should do is create a struct that contains the individual items. Then create an array of those entries.
    Code:
    struct Whatever
    {
       std::string c1;
       std::string c2;
       std::string c3;
    };
    Assuming that c1, c2, and c3 are your items, you create a vector, deque, linked list, etc. of these. All the data for a single line is together, and you just create an array of them.

    Regards,

    Paul McKenzie

  3. #48
    Join Date
    Dec 2009
    Posts
    89

    Re: best way to deal with big array or vector?

    Quote Originally Posted by Paul McKenzie View Post
    What you should do is create a struct that contains the individual items. Then create an array of those entries.
    Thansk Paul for your suggestion. Here is the code I came up with
    Code:
    #include <map>
    #include <vector>
    #include <stdio.h>
    #include <stdlib.h>
    #include <fstream>
    #include <iostream>
    #include <sstream>
    #include <string>
    #include <time.h>
    
    using namespace std;
    
    struct KeyStruct {
      string Name;
      int x;
    
      KeyStruct(const string name,const int xval) : 
        Name(name),x(xval) {}
    };
    
    struct KeyCmp {
      bool operator()( const KeyStruct & k1, const KeyStruct & k2 ) {
        if ( k1.Name < k2.Name ) {
          return true;
        } else if ( k1.Name == k2.Name && k1.x < k2.x ) {
          return true;
        } else {
          return false;
        }
      }
    };
    
    typedef map<KeyStruct, int, KeyCmp> mapStruct;
    
    int main( int argc, char *argv[] ) {
    
      if ( argc == 1 ) {
        fprintf(stderr, "Usage: xyTest <input> <output>\n");
        return 1;
      }
    
      mapStruct mapTest;
      ifstream fin(argv[1]);
      string tempName;
      int x1, x2;
      while ( fin >> tempName >> x1 >> x2 ) {
        string temp;
        getline(fin,temp);
        for ( int i = x1; i <= x2; ++i ) {
          KeyStruct mapTemp(tempName,i);
          ++mapTest[mapTemp];
        }
      }
      fin.close();
      ofstream fout(argv[2]);
      for ( mapStruct::iterator i = mapTest.begin(), end = mapTest.end(); i != end; ++i ) {
        fout << (i->first).Name << '\t' << (i->first).x << '\t' << i->second << '\n';
      }
      fout.close();
      return 0;
    }
    It works fine, but any other comment/suggestion to shorten/improve the code is welcome.

    Now I am working on how to write the summary, not all the points like in the beginning. Starting from 0, the summary should report different intervals with different counts. For example, with input file like this
    Code:
    a1	12	17
    a1	20	25
    a1	21	26
    a2	13	18
    a2	35	40
    a1	16	21
    a3	51	56
    a3	65	70
    the output should look like
    Code:
    a1	0	11	0
    a1	12	15	1
    a1	16	17	2
    a1	18	19	1
    a1	20	20	2
    a1	21	21	3
    a1	22	25	2
    a1	26	26      1
    a2	0	12	0
    a2	13	18      1
    a2      19      34      0
    a2      35      40      1
    ...
    Any ideas?

    Thanks,

    D.

  4. #49
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: best way to deal with big array or vector?

    Quote Originally Posted by dukevn View Post
    Are you saying that splitting input file to 8 chunks, processing those 8 chunks in parallel will not help at all?
    I can't be sure, but I wouldn't expect it to help very much. Why don't you try and find out?

  5. #50
    Join Date
    Dec 2009
    Posts
    89

    Re: best way to deal with big array or vector?

    Quote Originally Posted by Lindley View Post
    I can't be sure, but I wouldn't expect it to help very much. Why don't you try and find out?
    I will, since it is in my learning plan, but not just now because I have a feeling that it will involve more reading and time.

  6. #51
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: best way to deal with big array or vector?

    I suspect that accessing the same map from multiple threads will not improve the performance.
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  7. #52
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,725

    Re: best way to deal with big array or vector?

    If you are able to change the creation of the input file ....

    use unformatted I/O rather than text I/O. That should give
    a significant performance improvment.

  8. #53
    Join Date
    Dec 2009
    Posts
    89

    Re: best way to deal with big array or vector?

    Quote Originally Posted by Philip Nicoletti View Post
    If you are able to change the creation of the input file ....

    use unformatted I/O rather than text I/O. That should give
    a significant performance improvment.
    Unfortunately no, the input's format is fixed.

  9. #54
    Join Date
    Dec 2009
    Posts
    89

    Re: best way to deal with big array or vector?

    Quote Originally Posted by dukevn View Post
    Now I am working on how to write the summary, not all the points like in the beginning. Starting from 0, the summary should report different intervals with different counts. For example, with input file like this
    Code:
    a1	12	17
    a1	20	25
    a1	21	26
    a2	13	18
    a2	35	40
    a1	16	21
    a3	51	56
    a3	65	70
    the output should look like
    Code:
    a1	0	11	0
    a1	12	15	1
    a1	16	17	2
    a1	18	19	1
    a1	20	20	2
    a1	21	21	3
    a1	22	25	2
    a1	26	26      1
    a2	0	12	0
    a2	13	18      1
    a2      19      34      0
    a2      35      40      1
    ...
    In order to archive something like above, I have to mix up a lot of if and else if when reading each of the map array. Here is the final code
    Code:
    #include <map>
    #include <vector>
    #include <stdio.h>
    #include <stdlib.h>
    #include <fstream>
    #include <iostream>
    #include <sstream>
    #include <string>
    
    using namespace std;
    
    struct Point {
      string Name;
      int xPos;
    
      Point(const string A,const int B) : 
        Name(A),xPos(B) {}
    };
    
    struct PointCmp {
      bool operator()( const Point & b1, const Point & b2 ) {
        if ( b1.Name < b2.Name || ( b1.Name == b2.Name && b1.xPos < b2.xPos ) ) {
          return true;
        } else {
          return false;
        }
      }
    };
    
    typedef map<Point, int, PointCmp> mapPoint;
    
    int main (int argc, char * const argv[]) {
      if ( argc == 1 ) {
        fprintf(stderr, "Usage: xyTest <input> <output>\n");
        return 1;
      }
    
      mapPoint mPoint;
      ifstream fin(argv[1]);
      string tempName;
      int x1, x2;
      while ( fin >> tempName >> x1 >> x2 ) {
        string temp;
        getline(fin,temp);
        for ( int i = x1; i <= x2; ++i ) {
          Point A(tempName,i);
          ++mPoint[A];
        }
      }
      fin.close();
        
      int tempCount;
      int tempPos;
      ofstream fout(argv[2]);
      for ( mapPoint::iterator i = mPoint.begin(), end = mPoint.end(); i != end; ++i ) {
        if ( i == mPoint.begin() ) {
          tempCount = i->second;
          tempName = (i->first).Name;
          tempPos = (i->first).xPos;
          fout << tempName << '\t' << 0 << '\t' <<  tempPos - 1 << '\t' << 0 << '\n';
          fout << tempName << '\t' << tempPos << '\t';
        } else if ( i->second == tempCount && tempName == (i->first).Name && (i->first).xPos == tempPos + 1 ) {
          ++tempPos;
          if ( i == --mPoint.end() ) {
            fout << tempPos << '\t' << tempCount << '\n'; 
          }
        } else if ( i->second == tempCount && (tempName == (i->first).Name) ) {
          fout << tempPos << '\t' << tempCount << '\n';
          fout << tempName << '\t' << tempPos + 1 << '\t' << (i->first).xPos - 1 << '\t' << 0 << '\n';
          tempPos = (i->first).xPos;
          fout << tempName << '\t' << tempPos << '\t';
        } else if ( tempName == (i->first).Name && ((i->first).xPos == tempPos + 1) ) {
          fout << tempPos << '\t' << tempCount << '\n';
          ++tempPos;
          fout << tempName << '\t' << (i->first).xPos << '\t';
          tempCount = i->second;
        } else if ( tempName == (i->first).Name ) {
          fout << tempPos << '\t' << tempCount << '\n';
          fout << tempName << '\t' << tempPos + 1 << '\t' <<  (i->first).xPos - 1 << '\t' << 0 << '\n';
          fout << tempName << (i->first).xPos << '\t';
        } else {
          fout << tempPos << '\t' << tempCount << '\n';
          tempCount = i->second;
          tempName = (i->first).Name;
          tempPos = (i->first).xPos;
          fout << tempName << '\t' << 0 << '\t' <<  tempPos - 1 << '\t' << 0 << '\n';
          fout << tempName << '\t' << tempPos << '\t';
        } 
      }
      fout.close();
    	
      return 0;
    }
    It looks like a real mess inside the for loop, but I know nothing other than if/else if to do that. Any comment to shorten it or to make it look nicer are welcome.

Page 4 of 4 FirstFirst 1234

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured