I'm having a hard time with finding an efficient algorithm for this problem:
I have a very large 2D array filled with integers ranging from 0 to some specified maximum N.
I know need to increase the sum of all the elements by an integer amount m (possibly very large).
This m should be distrubuted around the array randomly, however maintaining that each element does not exceed N.
If there are k elements in the array, and m>k*N, then of course the array should be filled with N's.
I'm thinking about filling random entry's with an amount chosen from the Poisson Distribution, but I can't tell if this is truely random distributed then.
If there are k elements in the array, and m>k*N, then of course the array should be filled with N's.
this is not clear to me, do you mean that the array should grow in size in this case ? or that the to be distributed amount should be truncated to k*N ?
This m should be distrubuted around the array randomly, however maintaining that each element does not exceed N.
You have to define "random". Say your idea of a random distribution is a Poisson process.
In my view if you (blindfolded) throw balls at an array of buckets you'll have a Poisson process and the number of balls in each bucket will end up Poisson distributed. Then instead of throwing the balls you can consider each bucket one by one and fill each with the number of balls you draw from a Poisson distribution.
So your idea of using a Poisson distribution to fill each bucket (add to an array element) seems okay to me.
But note that although the distribution will be random (according to a Poisson process) the number of balls in each bucket may not. This is because there already are a varying number of balls in each bucket before you start and there's the upper limit to how many balls a bucket can hold.
Also note that the simulation algorithm of the Poisson distribution must be fast for this approach to make a difference.
Now you want to increase the sum (77) by M (lets say 20) but every number must be with in N (9), AND the number must still appear to be random...
If this is the case there are two methods i can think of..
#1) Walk up the array adding a random value between 0 and (N - K(i,j)) while the sum of all added values are < M. This may require several passes, or may end before a single pass is complete.
#2) Randomly select a K(i,j) and if K(i,j) < N add one. do this M times..
Both methods will require additional checking for if Sum(K) + M is < K * N in which case all K(i,j) = N ....
If this is the case there are two methods i can think of..
#1) Walk up the array adding a random value between 0 and (N - K(i,j)) while the sum of all added values are < M. This may require several passes, or may end before a single pass is complete.
#2) Randomly select a K(i,j) and if K(i,j) < N add one. do this M times..
These methods represent two different definitions of "random".
In #2 the distribution is according to a Poisson process. You're suggesting the naive approach. It will take a lot of time when M is big. I suppose the OP has realized this and is looking for something faster and instead contemplates for each K(i,j) to add a number drawn from a Poisson distribution. If this approach will be faster than the naive will depend on the efficiency of the Poisson variate algorithm that is used.
Note that M random selections may not be enougth to distribute all M (to subtract one unit from M you must add it to a K(i,j) and that may take many tries). There are several probabilistically equivalent ways of handling the N limit.
As to #1, this method is not according to a Poisson distribution because the number you add to K(i,j) will depend on the existing value of K(i,j). I'm not saying this is bad or anything but one should be aware of the fact than M is not independenly distributed with this method. It's conditioned on K.
Last edited by nuzzle; August 17th, 2011 at 06:05 AM.
the Primary problem here is targeting the sum of random numbers.. Once you have a Fixed target.. You always end up having to calculate the last few digits ..
The balance beam we have here is Mathematically correct VS Speed ....
#1 is not 100% but gets the job done and relatively quickly..
#2 is mathematical, however could take a long time for a Large M..
I was thinking that there is a way to smarten this up ... (merge the two methods)
#3 ) ... Randomly select a K(i,j) and add a random value between 0 and (N - K(i,j)) Summing till we get to M ..
I was thinking that there is a way to smarten this up ... (merge the two methods)
But there is and the OP is on the right track I think (see his second post).
You can either generate array positions at random, each time incrementing the selected element by one until M increments have been done (as you do in #2). Afterwards each array position has been incremented by a number that follows a Poisson distribution.
As an alternative to the above you visit each array position once and add a number drawn from a Poisson distribution (and not from a position dependent uniform distribution as you do in #1). It will be equivalent to the above and faster (under certain circumstances).
What remains is to
- determine the intensity parameter for the Poisson process,
- make sure the algorithm terminates correctly with exactly M increments, and
- locate a fast Poisson variate algorithm.
Last edited by nuzzle; August 17th, 2011 at 09:53 AM.
Bookmarks