I had been trying to make a reduction kernel to sum the contents of a very large array. I asked this question on stackoverflow, but still don't fully understand parts of the answer. For starters, what is meant by the last suggestion by Grizzly. I the below example, a step reduction, what is meant by stride for example? Do I call this with a global size smaller than the amount of the items in the array, and it will reduce the array to a new array with an amount of items equal to the global work size?
http://stackoverflow.com/questions/8...ting-cuda-codeCode:__kernel void reduction_step(__global const unsigned long* A, __global unsigned long * C, uint size) { unsigned long sum=0; for(int i=start; i < size; i += stride) sum += A[i]; C[get_global_id(0)]= sum; }


Reply With Quote
Bookmarks