CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 3 of 4 FirstFirst 1234 LastLast
Results 31 to 45 of 54
  1. #31
    Join Date
    Dec 2009
    Posts
    89

    Re: best way to deal with big array or vector?

    Quote Originally Posted by VladimirF View Post
    My guess is that code like that should execute at the speed of file I/O.
    I know I can copy a 1GB file in about 30 seconds, so 11 minutes sounds like WAY too much.
    Could you comment out everything in your code except for I/O and see how long that takes?
    Do you mind posting your code and a sample data file?
    Here is the final code
    Code:
    #include <map>
    #include <vector>
    #include <stdio.h>
    #include <stdlib.h>
    #include <fstream>
    #include <iostream>
    #include <sstream>
    #include <string>
    #include <time.h>
    
    using namespace std;
    
    int main( int argc, char *argv[] ) {
    
      if ( argc == 1 ) {
        fprintf(stderr, "Usage: xytest <input> <output>\n");
        return 1;
      }
    
      time_t start,end;
      double diff;
      time (&start);
      map<int, int> mapTest;
      ifstream fin(argv[1]);
      string tempLine;
      int x1, x2;
      while ( fin >> tempLine >> x1 >> x2 ) {
        getline(fin,tempLine);
        for ( int i = x1; i <= x2; ++i ) {
          ++mapTest[i];
        }
      }
      fin.close();
      ofstream fout(argv[2]);
      for ( map<int, int>::iterator i = mapTest.begin(), end = mapTest.end(); i != end; ++i ) {
        fout << i->first << '\t' << i->second << '\n';
      }
      fout.close();
      time (&end);
      diff = difftime(end, start);
      printf("Running time: %.2lf\n",diff);
      return 0;
    }
    Some lines in the input file is like this
    Code:
    a1	35952485	35952517	2	0	-
    a1	66611781	66611813	15	0	+
    a1	154104890	154104922	10	0	+
    a1	224631511	224631543	6	0	-
    and there are 25501054 lines similar like that in the file. The size of file is 1141987065.

    Let me try your suggestion to see what results are.

  2. #32
    Join Date
    Dec 2009
    Posts
    89

    Re: best way to deal with big array or vector?

    Quote Originally Posted by VladimirF View Post
    My guess is that code like that should execute at the speed of file I/O.
    Looks like the ++mapTest[i] takes a lot of time. I tested with replacing ++mapTest[i] with, for example, ++k, and deleting the writing to file part, then the code needs only 45s to run. Get ++mapTest[i] back in, it needs 656s. The rest (~ a minute or more) is the time of writing to output.

  3. #33
    Join Date
    Apr 1999
    Posts
    27,449

    Re: best way to deal with big array or vector?

    Quote Originally Posted by dukevn View Post
    Looks like the ++mapTest[i] takes a lot of time. I tested with replacing ++mapTest[i] with, for example, ++k, and deleting the writing to file part, then the code needs only 45s to run. Get ++mapTest[i] back in, it needs 656s. The rest (~ a minute or more) is the time of writing to output.
    1) What compiler are you using? Version?

    2) What compiler options?

    3) Are you timing a debug build, or an optimized release build? If it's a debug build, those timing results are meaningless.

    Regards,

    Paul McKenzie

  4. #34
    Join Date
    Dec 2009
    Posts
    89

    Re: best way to deal with big array or vector?

    Quote Originally Posted by Paul McKenzie View Post
    1) What compiler are you using? Version?
    On Unix:
    Code:
    $ g++ --version
    g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)
    On Mac Pro:
    Code:
    $ g++ --version
    i686-apple-darwin9-g++-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5490)
    Quote Originally Posted by Paul McKenzie View Post
    2) What compiler options?
    Code:
    $ g++ xytest.c -o xytest
    Quote Originally Posted by Paul McKenzie View Post
    3) Are you timing a debug build, or an optimized release build? If it's a debug build, those timing results are meaningless.
    Not sure what you are asking here, but once I compiled and get executable file xytest, I just run it. The time reported here is the time measured inside the code.

    What is debug build and what is optimized release build? How can my code be considered as debug, or release?

  5. #35
    Join Date
    Apr 1999
    Posts
    27,449

    Re: best way to deal with big array or vector?

    Quote Originally Posted by dukevn View Post
    Not sure what you are asking here,
    Well, what you don't understand is the whole ballgame.

    When you compile your application, there are certain options that you must choose so that the compiler optimizes the code. If you don't do that, the compiler generates unoptimized code. Unoptimized code is used for debugging purposes (the compiler may be checking iterators and doing extra work).

    The compiler options you have used do not turn on the optimizations. You need to rebuild your application using the proper optimization switches, and then retest your code.

    Regards,

    Paul McKenzie

  6. #36
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: best way to deal with big array or vector?

    Quote Originally Posted by dukevn View Post
    Looks like the ++mapTest[i] takes a lot of time.
    Do you know much about input data? The range of values? Distribution? Tipical span of one line? Limit on the output counters? Will it fit in a byte?
    I am trying to come up with some clever data structure. Like a map of short (4K?) arrays, where a key will be the low value shifted by a few bits to the right. That will provide a very fast access for your "for" loops, limiting a number of map look-ups to a number of lines in your file (or even less if you hit the same map key more than once).
    The point is that you perform sequential access to your counters for each input line, and maps are not good for that.
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  7. #37
    Join Date
    Dec 2009
    Posts
    89

    Re: best way to deal with big array or vector?

    Quote Originally Posted by Paul McKenzie View Post
    Well, what you don't understand is the whole ballgame.

    When you compile your application, there are certain options that you must choose so that the compiler optimizes the code. If you don't do that, the compiler generates unoptimized code. Unoptimized code is used for debugging purposes (the compiler may be checking iterators and doing extra work).

    The compiler options you have used do not turn on the optimizations. You need to rebuild your application using the proper optimization switches, and then retest your code.

    Regards,

    Paul McKenzie
    You got it right Paul. I've just started to learn C++ very recently. I do not know much about even C++ syntax, and I am trying to learn as much as I can. Honestly I do hear about coding optimization by choosing the right algorithms etc but never heard of optimization by compiling option. g++ input -o output is the only compiling command I know now (and option -lz if using zlib library).

    What are those "certain options" that I should try? Could you point out for me some resources about those?

  8. #38
    Join Date
    Dec 2009
    Posts
    89

    Re: best way to deal with big array or vector?

    Quote Originally Posted by VladimirF View Post
    Do you know much about input data? The range of values? Distribution? Tipical span of one line? Limit on the output counters? Will it fit in a byte?
    Well, just a little about input data (via some examples that I am learning and coding), and none of the rest.
    Quote Originally Posted by VladimirF View Post
    I am trying to come up with some clever data structure. Like a map of short (4K?) arrays, where a key will be the low value shifted by a few bits to the right. That will provide a very fast access for your "for" loops, limiting a number of map look-ups to a number of lines in your file (or even less if you hit the same map key more than once).
    The point is that you perform sequential access to your counters for each input line, and maps are not good for that.
    I am happy to learn anything to feed up my tiny c++ background, so any hint/instruction/suggestion/recommendation are welcome.

  9. #39
    Join Date
    Apr 1999
    Posts
    27,449

    Re: best way to deal with big array or vector?

    Quote Originally Posted by dukevn View Post
    Honestly I do hear about coding optimization by choosing the right algorithms etc but never heard of optimization by compiling option. g++ input -o output is the only compiling command I know now (and option -lz if using zlib library).
    Basically, compilers, regardless of the language, are able to optimize the code that you've written (hence the term "optimizing compiler"). Things like moving invariant code out of loops so that they don't get executed over and over again is an example of an optimization that can be done by the compiler.

    In your circumstance, not only will the compiler move code around to find the most efficient way to execute, but it may also eliminate "checking" code that checks for invalid iterators, array indices, etc. For example, the Visual C++ compiler has heavy iterator checking if the optimizations are turned off (and the "DEBUG" mode is chosen). An application that is built like this can run 5 to 10 times slower than an optimized version of the code.

    Your compiler should have a list of options. Usually a -help or -? option lists the various command options.
    What are those "certain options" that I should try? Could you point out for me some resources about those?
    I am not familiar with your compiler switches, so you will have to read the manual for the compiler, or see if a list of the options appear using the -help or -? switch. The g++ compiler is popular, but I don't know off the top of my head what those switches are (when I use g++, it is done under the CodeBlocks IDE, so I don't get too familiar with some of the command options).

    Regards,

    Paul McKenzie
    Last edited by Paul McKenzie; December 11th, 2009 at 11:00 PM.

  10. #40
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,765

    Re: best way to deal with big array or vector?

    Quote Originally Posted by Paul McKenzie
    3) Are you timing a debug build, or an optimized release build? If it's a debug build, those timing results are meaningless.
    Indeed, I note that no optimisations are turned on:
    Code:
    $ g++ xytest.c -o xytest
    dukevn, the relevant compiler documentation is available online: GCC 4.1.2 Online Manual. You should try:
    Code:
    $ g++ -O2 xytest.c -o xytest
    Actually, you should normally enable a higher warning level, e.g.,
    Code:
    $ g++ -Wall -O2 xytest.c -o xytest
    I also would use -pedantic to help check that I am not using some compiler extension, e.g., variable length arrays.
    Last edited by laserlight; December 11th, 2009 at 11:22 PM.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  11. #41
    Join Date
    Apr 1999
    Posts
    27,449

    Re: best way to deal with big array or vector?

    dukevn, here is a direct link to the optimization page. It explains in somewhat more detail than I did as to what optimization is and why you should use it.

    http://gcc.gnu.org/onlinedocs/gcc-4....timize-Options

    Regards,

    Paul McKenzie

  12. #42
    Join Date
    Dec 2009
    Posts
    89

    Re: best way to deal with big array or vector?

    Reports:

    - Compile: g++ -O2 -Wall xytest.c -o xytestO2. Run time: 04:44 (unix), 195s (Mac Pro)
    - Compile: g++ -O3 -Wall xytest.c -o xytestO3. Run time: 03:52 (unix), 195s (Mac Pro)

    It does help! Thank you very much Paul and laserlight. I am so happy that not only I have learned something, but also I got a very good resulting code.

  13. #43
    Join Date
    Aug 2007
    Posts
    858

    Re: best way to deal with big array or vector?

    Quote Originally Posted by dukevn View Post
    std::map is new to me. Even the C++ book I have (C++ Primer Plus) does not have it. Would you mind giving me some examples and how to use it?
    http://www.cplusplus.com/reference/

    /\ Your best friend when you need a quick reference for the standard library.

  14. #44
    Join Date
    Dec 2009
    Posts
    89

    Re: best way to deal with big array or vector?

    Quote Originally Posted by Speedo View Post
    http://www.cplusplus.com/reference/

    /\ Your best friend when you need a quick reference for the standard library.
    Thanks for the link Speedo. Actually those references do not help me as much as the examples. Normally I want to see examples first to see the interested method in action, then learn the theory/reference later.

  15. #45
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: best way to deal with big array or vector?

    Quote Originally Posted by dukevn View Post
    ...I got a very good resulting code.
    So are you OK with these results? (It's still taking over 3 minutes!)
    Is that a kind of processing you do often? I assume that if it was a one time deal, you'd be OK with original 11 minutes.
    Text file are usually very zippable, could you try to zip your 1GB file and see how big it is? Any way you could share it?
    I also assume that your lates code is in post # 31 above.
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

Page 3 of 4 FirstFirst 1234 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured