design high-performance 3d dynamic array
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 12 of 12

Thread: design high-performance 3d dynamic array

  1. #1
    Join Date
    Feb 2010
    Posts
    4

    design high-performance 3d dynamic array

    Hi, everyone:

    I am working on a scientific computation program, which needs a very
    high-performance three dimensional dynamic array. I tried
    std::vector<vector<vector<T> > >, boost::multi_array and blitz::Array,
    but the performance of all these libraries couldn't satisfy my
    needs. Therefore, I designed my own array class, the fragment of which
    is as follows:

    template<class T>
    class array
    {
    public:
    array(size_t d1,size_t d2,size_t d3)
    {
    p=new T**[d1];
    for(size_t i=0;i!=d1;++i)
    p[i]=new T*[d2];
    for(size_t i=0;i!=d1;++i)
    for(size_t j=0;j!=d2;++j)
    p[i][j]=new T[d3];

    for (size_t i=0;i!=d1;++i)
    for (size_t j=0;j!=d2;++j)
    for (size_t k=0;k!=d3;++k)
    p[i][j][k]=T();
    }

    public:
    T& operator()(size_t d1,size_t d2,size_t d3)
    {
    return p[d1][d2][d3];
    }
    private:
    T ***p;
    };


    However, since I have to read and write the array elements for more
    than ten million times in the computation, the overloaded parentheses
    will decrease the efficiency dramatically. To improve the performance,
    one way is to make T ***p public and using array.p[i][j][k] to
    read/write data. However, I think this design is hideous.

    So, does anyone have a better idea to hide the internal pointer p from
    users without decrease the efficiency? Or, does anyone have a complete
    new idea of a high-performance three-dimensional dynamic array?

    Thanks in advance!

  2. #2
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,245

    Re: design high-performance 3d dynamic array

    Quote Originally Posted by heath123
    I am working on a scientific computation program, which needs a very
    high-performance three dimensional dynamic array. I tried
    std::vector<vector<vector<T> > >, boost::multi_array and blitz::Array,
    but the performance of all these libraries couldn't satisfy my
    needs. Therefore, I designed my own array class, the fragment of which
    is as follows:
    The problem is that it looks like you are going to write your own version of std::vector<std::vector<std::vector<T> > >, except that it will probably have worse performance in some areas (e.g., when expanding the dynamic array), and no better for the rest.

    Did you enable optimisations? If you are using a compiler that has checked iterators, did you disable checked iterators?

    However, since I have to read and write the array elements for more
    than ten million times in the computation, the overloaded parentheses
    will decrease the efficiency dramatically.
    Maybe, maybe not. The overloaded operator() might be inlined, thus making function call overhead zero.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  3. #3
    Join Date
    Feb 2010
    Posts
    4

    Re: design high-performance 3d dynamic array

    Quote Originally Posted by laserlight View Post

    Maybe, maybe not. The overloaded operator() might be inlined, thus making function call overhead zero.
    I concluded that from comparing the total run time of my array with that of a static 3d array using freebsd+g++. I didn't use any optimization of the compiler, just

    g++ *.cpp -o myprogram

    To my knowledge, inlined function is just a suggestion to compiler. No one knows whether it is inlined. I have to ensure the performance of my program. So is there any way to force the compiler to inline a specific function?

  4. #4
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,245

    Re: design high-performance 3d dynamic array

    Quote Originally Posted by heath123
    I concluded that from comparing the total run time of my array with that of a static 3d array using freebsd+g++. I didn't use any optimization of the compiler, just

    g++ *.cpp -o myprogram
    Unfortunately, performance comparisions without enabling compiler optimisations can give misleading results. Furthermore, you must keep in mind that what you are testing differs in power from what you are testing against, so you should judge based on the desired performance requirements across what you are actually going to use it for, not on whether it matches up to the performance of "a static 3d array" for only a limited set of operations. (Although this can be used as a guide.)

    Quote Originally Posted by heath123
    To my knowledge, inlined function is just a suggestion to compiler. No one knows whether it is inlined. I have to ensure the performance of my program. So is there any way to force the compiler to inline a specific function?
    There are compiler specific ways to send stronger hints to inline something, but compilers might not actually have to follow them in the end. Your overload of operator(), defined inline as it is in a class template definition, and with just a return statement, is already a good candidate for inlining.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  5. #5
    Join Date
    Apr 1999
    Posts
    27,422

    Re: design high-performance 3d dynamic array

    Quote Originally Posted by heath123 View Post
    Hi, everyone:

    I am working on a scientific computation program, which needs a very
    high-performance three dimensional dynamic array. I tried
    std::vector<vector<vector<T> > >, boost::multi_array and blitz::Array,
    but the performance of all these libraries couldn't satisfy my
    needs.
    I didn't use any optimization of the compiler,
    Well, that invalidates anything you've stated about things being "too slow" and having to write your own classes. C++ compiler optimizations can be very dramatic. For example, unoptimized code can run 5 to 10 times slower than optimized code when it comes to the Visual C++ compilers.

    Also another thing is that the libraries you are saying are slow or not fast enough are created by some of the best C++ programmers in the business. Performance of these libraries is something that these programmers carefully consider, and attempt to accomplish. You need a compelling argument to really state you can do better than these people when it comes to building C++ classes and components.
    Therefore, I designed my own array class, the fragment of which
    is as follows:
    The array class you defined is very simplistic, and not efficient. If your arrays are dynamic, but once created, stay the same size, then what you have written is slow in creation and destruction.
    Code:
      array(size_t d1,size_t d2,size_t d3)
      {
        p=new T**[d1];
        for(size_t i=0;i!=d1;++i)
          p[i]=new T*[d2];
        for(size_t i=0;i!=d1;++i)
          for(size_t j=0;j!=d2;++j)
    	p[i][j]=new T[d3];
    You are calling the allocator multiple times in a loop. What if the array is 100x100x100? You are calling "new" thousands of times. That not only is slow, you are fragmenting memory.

    You only need to call new 3 times regardless of the size of the dimensions. The first call to new allocates the d1 (which you did). The second call to new allocates all of the second dimension in a pool of pointers. The third call to new allocates the room for the data. Then the loops points the pointers in the right places.

    The code below has not been tested, but hopefully I didn't make any mistakes.
    Code:
         template <typename T>
         T*** Create3DArray(int nLayers, int nRows, int nCols)
         {
              T*** p = new T**[nLayers];
              T** p2 = new T*[nLayers * nRows];
              T* p3 = new T[nLayers * nRows * nCols];
    
              for (int i = 0; i < nLayers; ++i )
              {
                  p[i] = p2 + i * nRows;
                  for (int j = 0; j < nRows; ++j )
                   p[i][j] = p3 + (i*nRows+j )* nCols;
              }
              return p;
          }
         
          template <typename T>
          void Free3DArray(T ***array)
          {
            delete[] array[0][0];
            delete[] array[0];
            delete[] array;
          }
    
         int main()
        {
              double ***d3 = Create3DArray<double>(100,100,100);
              d3[56][23][21] = 1234;
              Free3DArray<double>(d3);
         }
    This same code calls "new" 3 times, compared to the thousands of times your code called "new". Also, the destruction isn't in a loop -- just three invocations of "delete[]" needs to be done.

    Regards,

    Paul McKenzie
    Last edited by Paul McKenzie; March 13th, 2010 at 11:44 AM. Reason: Corrected mistake pointed out by OP

  6. #6
    Join Date
    Feb 2010
    Posts
    4

    Re: design high-performance 3d dynamic array

    Quote Originally Posted by Paul McKenzie View Post

    Also another thing is that the libraries you are saying are slow or not fast enough are created by some of the best C++ programmers in the business. Performance of these libraries is something that these programmers carefully consider, and attempt to accomplish. You need a compelling argument to really state you can do better than these people when it comes to building C++ classes and components.
    Thank you first.

    I don't know whether my proof is sufficient when I stated those array
    libraries I mentioned are slower than mine. I just used each of them in
    my computation program, compiled it with "g++ *.cpp -o myprogram" and
    measured the total run time for each case. Since the only difference
    of each computation program is using different array library, I
    concluded that the array library used by the fastest computation program
    corresponds to the most efficient array library. Of course, this
    conclusion is limited to the functions I used (in my case, the
    function of reading/writing elements).

    I think your solution to this problem is very clever. The highlight is
    that the data is stored continuously, which is similar to C++ build-in
    static array and ensure the performance. By the way, there is a tiny mistake:
    Code:
    p[i][j] = p3 + i*j * nCols;
    It should be
    Code:
    p[i][j] = p3 + (i*nRows+j )* nCols;
    However, maybe I am too picky, I think your solution is more
    C-like. When enclosing it inside a class, you have to provide some
    methods to visit the array elements, and these methods may decrease
    the performance. Do you have a more object-oriented way to design the
    array while maintaining its performance?

  7. #7
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,245

    Re: design high-performance 3d dynamic array

    Quote Originally Posted by heath123
    I don't know whether my proof is sufficient when I stated those array
    libraries I mentioned are slower than mine. (...) Of course, this
    conclusion is limited to the functions I used (in my case, the
    function of reading/writing elements).
    From what I understand, you just compared the element access time for the various containers against that of an array, since your own dynamic array library is currently in development and thus cannot be used in a comparison. Even with compiler optimisations turned on, I do not think that these containers can beat an array in element access time, and in fact no container can. At best, they can just equal its performance.

    I suggest that you take a look at boost::multi_array and blitz::Array again. Compile with optimisations turned on, and instead of just dismissing them because they are slower in element access, check to see if this reduced performance really will have a noticeable impact when multiplied ten million times (e.g., a one second delay might be tolerable if the computation is going to take several seconds either way).

    Quote Originally Posted by heath123
    I just used each of them in
    my computation program, compiled it with "g++ *.cpp -o myprogram" and
    measured the total run time for each case.
    You can try with "g++ -O2 -D NDEBUG *.cpp -o myprogram". There is also -O3, but it is probably more prone to generating erroneous code.

    Quote Originally Posted by heath123
    I think your solution to this problem is very clever. The highlight is
    that the data is stored continuously, which is similar to C++ build-in
    static array and ensure the performance.
    This is actually a well known idiom when you want multi-dimensional arrays, e.g., it is mentioned in the Boost.MultiArray documentation. What I find very clever is the avoidance of a typical offset computation function by the use of intermediate levels of pointers, thus getting the compiler to do it instead.

    Quote Originally Posted by heath123
    However, maybe I am too picky, I think your solution is more
    C-like. When enclosing it inside a class, you have to provide some
    methods to visit the array elements, and these methods may decrease
    the performance. Do you have a more object-oriented way to design the
    array while maintaining its performance?
    Once you involve functions, you pretty much depend on the compiler to inline them, if function call overhead would be significant. But again, since you did not enable compiler optimisations, maybe your fears are unfounded.
    Last edited by laserlight; February 28th, 2010 at 08:45 AM.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  8. #8
    Join Date
    Jul 2005
    Location
    Netherlands
    Posts
    2,010

    Re: design high-performance 3d dynamic array

    First off, before you do anything else, set up a good test application that will test the performance of the kind of operations you need to perform. E.g. if you intent to use an algorithm that relies heavily on random access, test that. If you need to perform an operation on each element, test that. If you need to construct/destruct a lot of objects, test that.
    Make sure you perform each operation multiple times in one test, preferably interleaved and with a call to sleep before the start of each test, such that you minimize the interference of the rest of your system. You can test the stability of your measurements by performing them several times and observing the difference between fastest and slowest.
    Finally, make sure you build your application with full optimization. This can also include preprocessor definitions in the library, e.g. in MS VS 2005 and higher, you need to define _SECURE_SCL as 0.

    If you don't do the above, you can't talk about performance, you can only guess at it. And I've learned the hard way that it's almost impossible to guess right.

    Now on the concrete issue, I think I can give you some pointers having done similar optimizations myself for a 2D array. I agree with Paul that it's better to store your data contiguously. Not only does it make construction/destruction more efficient, but it can also help to improve cache locality. If it were me, I would simply use a std::vector, since it does all you want and more and there is no significant overhead if you use an optimized build.

    Second, to improve the performance of random access I have used a simple trick. In my data structure I stored pointers to the first element of each row (as I had a 2D matrix). The random access operator used that pointer instead of doing something like row * columns.
    Code:
    std::vector<T*> m_FirstElementInRow;
    //...
    T& Get(size_t row, size_t column) {
        return m_FirstElementInRow[row][column];
        // instead of
        return m_Data[row * m_Columns + column];
    }
    This worked for me because I didn't have to resize my matrix after construction, so the pointers only had to be set once.

    Third, for operations that go over each element in your data structure, use iterators with an STL algorithm. If you use a std::vector for you data storage, you can simply provide a std::vector<T>::iterator as an iterator for your data structure. I have measured performance boosts of a factor 2 up to 8 in comparison with hand crafted loops using the random access operator. In the fastest case, I suspect some optimizations at the processor level kicked in (something like SIMD). You can't beat that with any pure C or C++ code.
    Cheers, D Drmmr

    Please put [code][/code] tags around your code to preserve indentation and make it more readable.

    As long as man ascribes to himself what is merely a posibility, he will not work for the attainment of it. - P. D. Ouspensky

  9. #9
    Join Date
    Feb 2010
    Posts
    4

    Re: design high-performance 3d dynamic array

    Quote Originally Posted by laserlight View Post
    From what I understand, you just compared the element access time for the various containers against that of an array, since your own dynamic array library is currently in development and thus cannot be used in a comparison. Even with compiler optimisations turned on, I do not think that these containers can beat an array in element access time, and in fact no container can. At best, they can just equal its performance.
    Yes. In fact the core of my program is reading from and writing to each array element. So the element access speed has a big impact on the total computation time.


    This is actually a well known idiom when you want multi-dimensional arrays, e.g., it is mentioned in the Boost.MultiArray documentation. What I find very clever is the avoidance of a typical offset computation function by the use of intermediate levels of pointers, thus getting the compiler to do it instead.
    Exactly. At first I used a one-dimensional array to simulate three-dimension array and calculated the offset myself but got very poor performance.

  10. #10
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Fairfax, VA
    Posts
    10,888

    Re: design high-performance 3d dynamic array

    ^The first thing you need to do is forget the results of your previous tests. Since you didn't use compiler optimizations, those results are invalid. Do them again! And read the libraries' documentation carefully to see if they used checked iterators, and disable those too if you can.

    Quote Originally Posted by Paul McKenzie View Post
    You only need to call new 3 times regardless of the size of the dimensions. The first call to new allocates the d1 (which you did). The second call to new allocates all of the second dimension in a pool of pointers. The third call to new allocates the room for the data. Then the loops points the pointers in the right places.
    Actually, you can get away with only *one* allocation/deallocation if you really want to. But you need to use malloc()/free() to do it rather than new. (Actually, does new initialize primitive types? I doubt it, but if so, malloc() may be faster anyway.) You'd just do all three of the above allocations back-to-back in a single block of memory. You need to do some type-casting though, to get it to mean the things you want it to mean.

    This is largely just an academic observation; I don't believe this approach will actually produce a sufficiently meaningful difference in performance to justify the less-obvious code.

  11. #11
    Join Date
    Jul 2002
    Location
    Portsmouth. United Kingdom
    Posts
    2,699

    Re: design high-performance 3d dynamic array

    Quote Originally Posted by heath123 View Post
    I tried
    std::vector<vector<vector<T> > >
    If you haven't done so already, you could try to reserve the maximum size of the vectors so that no reallocation of memory is required when expanding the array.

    In all of the experiments I have done I have found no speed difference between using arrays + pointers and vectors + iterators when compiled under full optimisation.
    "It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
    Richard P. Feynman

  12. #12
    Join Date
    Jan 2008
    Location
    California, USA
    Posts
    822

    Re: design high-performance 3d dynamic array

    Quote Originally Posted by heath123
    Yes. In fact the core of my program is reading from and writing to each array element. So the element access speed has a big impact on the total computation time.
    The subscript operators are overload for std::vector for const and non-const version. If accessing elements through the operator[] has a room for improvement in, it lies in the non-const version which corresponds to your write operations. If the job requires more of reads (and that it is deterministic to say so), then writing a user defined class is less likely fruitful.
    However, maybe I am too picky, I think your solution is more
    C-like. When enclosing it inside a class, you have to provide some
    methods to visit the array elements, and these methods may decrease
    the performance. Do you have a more object-oriented way to design the
    array while maintaining its performance?
    As noted by JohnW@Wessex, I personally think that using reserve might be the best approach. If you still think it's not, writing a custom allocator class first to test it with the vector would be alternative option.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Azure Activities Information Page

Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center