-
February 28th, 2010, 02:35 AM
#1
design high-performance 3d dynamic array
Hi, everyone:
I am working on a scientific computation program, which needs a very
high-performance three dimensional dynamic array. I tried
std::vector<vector<vector<T> > >, boost::multi_array and blitz::Array,
but the performance of all these libraries couldn't satisfy my
needs. Therefore, I designed my own array class, the fragment of which
is as follows:
template<class T>
class array
{
public:
array(size_t d1,size_t d2,size_t d3)
{
p=new T**[d1];
for(size_t i=0;i!=d1;++i)
p[i]=new T*[d2];
for(size_t i=0;i!=d1;++i)
for(size_t j=0;j!=d2;++j)
p[i][j]=new T[d3];
for (size_t i=0;i!=d1;++i)
for (size_t j=0;j!=d2;++j)
for (size_t k=0;k!=d3;++k)
p[i][j][k]=T();
}
public:
T& operator()(size_t d1,size_t d2,size_t d3)
{
return p[d1][d2][d3];
}
private:
T ***p;
};
However, since I have to read and write the array elements for more
than ten million times in the computation, the overloaded parentheses
will decrease the efficiency dramatically. To improve the performance,
one way is to make T ***p public and using array.p[i][j][k] to
read/write data. However, I think this design is hideous.
So, does anyone have a better idea to hide the internal pointer p from
users without decrease the efficiency? Or, does anyone have a complete
new idea of a high-performance three-dimensional dynamic array?
Thanks in advance!
-
February 28th, 2010, 03:23 AM
#2
Re: design high-performance 3d dynamic array
Originally Posted by heath123
I am working on a scientific computation program, which needs a very
high-performance three dimensional dynamic array. I tried
std::vector<vector<vector<T> > >, boost::multi_array and blitz::Array,
but the performance of all these libraries couldn't satisfy my
needs. Therefore, I designed my own array class, the fragment of which
is as follows:
The problem is that it looks like you are going to write your own version of std::vector<std::vector<std::vector<T> > >, except that it will probably have worse performance in some areas (e.g., when expanding the dynamic array), and no better for the rest.
Did you enable optimisations? If you are using a compiler that has checked iterators, did you disable checked iterators?
However, since I have to read and write the array elements for more
than ten million times in the computation, the overloaded parentheses
will decrease the efficiency dramatically.
Maybe, maybe not. The overloaded operator() might be inlined, thus making function call overhead zero.
-
February 28th, 2010, 03:57 AM
#3
Re: design high-performance 3d dynamic array
Originally Posted by laserlight
Maybe, maybe not. The overloaded operator() might be inlined, thus making function call overhead zero.
I concluded that from comparing the total run time of my array with that of a static 3d array using freebsd+g++. I didn't use any optimization of the compiler, just
g++ *.cpp -o myprogram
To my knowledge, inlined function is just a suggestion to compiler. No one knows whether it is inlined. I have to ensure the performance of my program. So is there any way to force the compiler to inline a specific function?
-
February 28th, 2010, 04:18 AM
#4
Re: design high-performance 3d dynamic array
Originally Posted by heath123
I concluded that from comparing the total run time of my array with that of a static 3d array using freebsd+g++. I didn't use any optimization of the compiler, just
g++ *.cpp -o myprogram
Unfortunately, performance comparisions without enabling compiler optimisations can give misleading results. Furthermore, you must keep in mind that what you are testing differs in power from what you are testing against, so you should judge based on the desired performance requirements across what you are actually going to use it for, not on whether it matches up to the performance of "a static 3d array" for only a limited set of operations. (Although this can be used as a guide.)
Originally Posted by heath123
To my knowledge, inlined function is just a suggestion to compiler. No one knows whether it is inlined. I have to ensure the performance of my program. So is there any way to force the compiler to inline a specific function?
There are compiler specific ways to send stronger hints to inline something, but compilers might not actually have to follow them in the end. Your overload of operator(), defined inline as it is in a class template definition, and with just a return statement, is already a good candidate for inlining.
-
February 28th, 2010, 04:29 AM
#5
Re: design high-performance 3d dynamic array
Originally Posted by heath123
Hi, everyone:
I am working on a scientific computation program, which needs a very
high-performance three dimensional dynamic array. I tried
std::vector<vector<vector<T> > >, boost::multi_array and blitz::Array,
but the performance of all these libraries couldn't satisfy my
needs.
I didn't use any optimization of the compiler,
Well, that invalidates anything you've stated about things being "too slow" and having to write your own classes. C++ compiler optimizations can be very dramatic. For example, unoptimized code can run 5 to 10 times slower than optimized code when it comes to the Visual C++ compilers.
Also another thing is that the libraries you are saying are slow or not fast enough are created by some of the best C++ programmers in the business. Performance of these libraries is something that these programmers carefully consider, and attempt to accomplish. You need a compelling argument to really state you can do better than these people when it comes to building C++ classes and components.
Therefore, I designed my own array class, the fragment of which
is as follows:
The array class you defined is very simplistic, and not efficient. If your arrays are dynamic, but once created, stay the same size, then what you have written is slow in creation and destruction.
Code:
array(size_t d1,size_t d2,size_t d3)
{
p=new T**[d1];
for(size_t i=0;i!=d1;++i)
p[i]=new T*[d2];
for(size_t i=0;i!=d1;++i)
for(size_t j=0;j!=d2;++j)
p[i][j]=new T[d3];
You are calling the allocator multiple times in a loop. What if the array is 100x100x100? You are calling "new" thousands of times. That not only is slow, you are fragmenting memory.
You only need to call new 3 times regardless of the size of the dimensions. The first call to new allocates the d1 (which you did). The second call to new allocates all of the second dimension in a pool of pointers. The third call to new allocates the room for the data. Then the loops points the pointers in the right places.
The code below has not been tested, but hopefully I didn't make any mistakes.
Code:
template <typename T>
T*** Create3DArray(int nLayers, int nRows, int nCols)
{
T*** p = new T**[nLayers];
T** p2 = new T*[nLayers * nRows];
T* p3 = new T[nLayers * nRows * nCols];
for (int i = 0; i < nLayers; ++i )
{
p[i] = p2 + i * nRows;
for (int j = 0; j < nRows; ++j )
p[i][j] = p3 + (i*nRows+j )* nCols;
}
return p;
}
template <typename T>
void Free3DArray(T ***array)
{
delete[] array[0][0];
delete[] array[0];
delete[] array;
}
int main()
{
double ***d3 = Create3DArray<double>(100,100,100);
d3[56][23][21] = 1234;
Free3DArray<double>(d3);
}
This same code calls "new" 3 times, compared to the thousands of times your code called "new". Also, the destruction isn't in a loop -- just three invocations of "delete[]" needs to be done.
Regards,
Paul McKenzie
Last edited by Paul McKenzie; March 13th, 2010 at 12:44 PM.
Reason: Corrected mistake pointed out by OP
-
February 28th, 2010, 08:58 AM
#6
Re: design high-performance 3d dynamic array
Originally Posted by Paul McKenzie
Also another thing is that the libraries you are saying are slow or not fast enough are created by some of the best C++ programmers in the business. Performance of these libraries is something that these programmers carefully consider, and attempt to accomplish. You need a compelling argument to really state you can do better than these people when it comes to building C++ classes and components.
Thank you first.
I don't know whether my proof is sufficient when I stated those array
libraries I mentioned are slower than mine. I just used each of them in
my computation program, compiled it with "g++ *.cpp -o myprogram" and
measured the total run time for each case. Since the only difference
of each computation program is using different array library, I
concluded that the array library used by the fastest computation program
corresponds to the most efficient array library. Of course, this
conclusion is limited to the functions I used (in my case, the
function of reading/writing elements).
I think your solution to this problem is very clever. The highlight is
that the data is stored continuously, which is similar to C++ build-in
static array and ensure the performance. By the way, there is a tiny mistake:
Code:
p[i][j] = p3 + i*j * nCols;
It should be
Code:
p[i][j] = p3 + (i*nRows+j )* nCols;
However, maybe I am too picky, I think your solution is more
C-like. When enclosing it inside a class, you have to provide some
methods to visit the array elements, and these methods may decrease
the performance. Do you have a more object-oriented way to design the
array while maintaining its performance?
-
February 28th, 2010, 09:42 AM
#7
Re: design high-performance 3d dynamic array
Originally Posted by heath123
I don't know whether my proof is sufficient when I stated those array
libraries I mentioned are slower than mine. (...) Of course, this
conclusion is limited to the functions I used (in my case, the
function of reading/writing elements).
From what I understand, you just compared the element access time for the various containers against that of an array, since your own dynamic array library is currently in development and thus cannot be used in a comparison. Even with compiler optimisations turned on, I do not think that these containers can beat an array in element access time, and in fact no container can. At best, they can just equal its performance.
I suggest that you take a look at boost::multi_array and blitz::Array again. Compile with optimisations turned on, and instead of just dismissing them because they are slower in element access, check to see if this reduced performance really will have a noticeable impact when multiplied ten million times (e.g., a one second delay might be tolerable if the computation is going to take several seconds either way).
Originally Posted by heath123
I just used each of them in
my computation program, compiled it with "g++ *.cpp -o myprogram" and
measured the total run time for each case.
You can try with "g++ -O2 -D NDEBUG *.cpp -o myprogram". There is also -O3, but it is probably more prone to generating erroneous code.
Originally Posted by heath123
I think your solution to this problem is very clever. The highlight is
that the data is stored continuously, which is similar to C++ build-in
static array and ensure the performance.
This is actually a well known idiom when you want multi-dimensional arrays, e.g., it is mentioned in the Boost.MultiArray documentation. What I find very clever is the avoidance of a typical offset computation function by the use of intermediate levels of pointers, thus getting the compiler to do it instead.
Originally Posted by heath123
However, maybe I am too picky, I think your solution is more
C-like. When enclosing it inside a class, you have to provide some
methods to visit the array elements, and these methods may decrease
the performance. Do you have a more object-oriented way to design the
array while maintaining its performance?
Once you involve functions, you pretty much depend on the compiler to inline them, if function call overhead would be significant. But again, since you did not enable compiler optimisations, maybe your fears are unfounded.
Last edited by laserlight; February 28th, 2010 at 09:45 AM.
-
February 28th, 2010, 09:55 AM
#8
Re: design high-performance 3d dynamic array
First off, before you do anything else, set up a good test application that will test the performance of the kind of operations you need to perform. E.g. if you intent to use an algorithm that relies heavily on random access, test that. If you need to perform an operation on each element, test that. If you need to construct/destruct a lot of objects, test that.
Make sure you perform each operation multiple times in one test, preferably interleaved and with a call to sleep before the start of each test, such that you minimize the interference of the rest of your system. You can test the stability of your measurements by performing them several times and observing the difference between fastest and slowest.
Finally, make sure you build your application with full optimization. This can also include preprocessor definitions in the library, e.g. in MS VS 2005 and higher, you need to define _SECURE_SCL as 0.
If you don't do the above, you can't talk about performance, you can only guess at it. And I've learned the hard way that it's almost impossible to guess right.
Now on the concrete issue, I think I can give you some pointers having done similar optimizations myself for a 2D array. I agree with Paul that it's better to store your data contiguously. Not only does it make construction/destruction more efficient, but it can also help to improve cache locality. If it were me, I would simply use a std::vector, since it does all you want and more and there is no significant overhead if you use an optimized build.
Second, to improve the performance of random access I have used a simple trick. In my data structure I stored pointers to the first element of each row (as I had a 2D matrix). The random access operator used that pointer instead of doing something like row * columns.
Code:
std::vector<T*> m_FirstElementInRow;
//...
T& Get(size_t row, size_t column) {
return m_FirstElementInRow[row][column];
// instead of
return m_Data[row * m_Columns + column];
}
This worked for me because I didn't have to resize my matrix after construction, so the pointers only had to be set once.
Third, for operations that go over each element in your data structure, use iterators with an STL algorithm. If you use a std::vector for you data storage, you can simply provide a std::vector<T>::iterator as an iterator for your data structure. I have measured performance boosts of a factor 2 up to 8 in comparison with hand crafted loops using the random access operator. In the fastest case, I suspect some optimizations at the processor level kicked in (something like SIMD). You can't beat that with any pure C or C++ code.
Cheers, D Drmmr
Please put [code][/code] tags around your code to preserve indentation and make it more readable.
As long as man ascribes to himself what is merely a posibility, he will not work for the attainment of it. - P. D. Ouspensky
-
February 28th, 2010, 10:10 AM
#9
Re: design high-performance 3d dynamic array
Originally Posted by laserlight
From what I understand, you just compared the element access time for the various containers against that of an array, since your own dynamic array library is currently in development and thus cannot be used in a comparison. Even with compiler optimisations turned on, I do not think that these containers can beat an array in element access time, and in fact no container can. At best, they can just equal its performance.
Yes. In fact the core of my program is reading from and writing to each array element. So the element access speed has a big impact on the total computation time.
This is actually a well known idiom when you want multi-dimensional arrays, e.g., it is mentioned in the Boost.MultiArray documentation. What I find very clever is the avoidance of a typical offset computation function by the use of intermediate levels of pointers, thus getting the compiler to do it instead.
Exactly. At first I used a one-dimensional array to simulate three-dimension array and calculated the offset myself but got very poor performance.
-
February 28th, 2010, 10:51 AM
#10
Re: design high-performance 3d dynamic array
^The first thing you need to do is forget the results of your previous tests. Since you didn't use compiler optimizations, those results are invalid. Do them again! And read the libraries' documentation carefully to see if they used checked iterators, and disable those too if you can.
Originally Posted by Paul McKenzie
You only need to call new 3 times regardless of the size of the dimensions. The first call to new allocates the d1 (which you did). The second call to new allocates all of the second dimension in a pool of pointers. The third call to new allocates the room for the data. Then the loops points the pointers in the right places.
Actually, you can get away with only *one* allocation/deallocation if you really want to. But you need to use malloc()/free() to do it rather than new. (Actually, does new initialize primitive types? I doubt it, but if so, malloc() may be faster anyway.) You'd just do all three of the above allocations back-to-back in a single block of memory. You need to do some type-casting though, to get it to mean the things you want it to mean.
This is largely just an academic observation; I don't believe this approach will actually produce a sufficiently meaningful difference in performance to justify the less-obvious code.
-
March 1st, 2010, 04:03 AM
#11
Re: design high-performance 3d dynamic array
Originally Posted by heath123
I tried
std::vector<vector<vector<T> > >
If you haven't done so already, you could try to reserve the maximum size of the vectors so that no reallocation of memory is required when expanding the array.
In all of the experiments I have done I have found no speed difference between using arrays + pointers and vectors + iterators when compiled under full optimisation.
"It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
Richard P. Feynman
-
March 1st, 2010, 06:22 AM
#12
Re: design high-performance 3d dynamic array
Originally Posted by heath123
Yes. In fact the core of my program is reading from and writing to each array element. So the element access speed has a big impact on the total computation time.
The subscript operators are overload for std::vector for const and non-const version. If accessing elements through the operator[] has a room for improvement in, it lies in the non-const version which corresponds to your write operations. If the job requires more of reads (and that it is deterministic to say so), then writing a user defined class is less likely fruitful.
However, maybe I am too picky, I think your solution is more
C-like. When enclosing it inside a class, you have to provide some
methods to visit the array elements, and these methods may decrease
the performance. Do you have a more object-oriented way to design the
array while maintaining its performance?
As noted by JohnW@Wessex, I personally think that using reserve might be the best approach. If you still think it's not, writing a custom allocator class first to test it with the vector would be alternative option.
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|