CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 7 of 7

Threaded View

  1. #6
    Join Date
    May 2009
    Posts
    2,413

    Re: random partition in to k groups

    Quote Originally Posted by dshawul View Post
    Well it seems the easiest approach of picking one at a time randomly works well.
    I've run your example and I think I better understand the problem now. Using the specific numbers from the example you want each of the 10 rows to sum up to 4000 and each of the 3 columns to sum up to 10000, 20000 and 10000 respectively.

    I haven't fully analyzed the algorithm but I take it you're making use of the central limit theorem to simulate the probability distribution for each row (which should tend towards 10000/40000, 20000/40000, 10000/40000 that is 1/4, 1/2, 1/4 probabilities). Thus with a row sum of 4000 the result should be close to 1000, 2000, 1000 but strangely the first row is,

    1140, 2319, 541

    This adds up to 4000 so it's correct in that way but the numbers deviate far too much from the expected 1000, 2000, 1000 averages so there's something fishy going on. It's the same with all 10 rows.

    And in fact you're overextending rand(). This simple generator produces random numbers between 0 and RAND_MAX (which usually is 32767). Then you're using this technique to generate random numbers in the 0 to N-1 range:

    Code:
    int v = rand() % N;
    It's based on that N should be fairly small in relation to what rand() produces to work properly. But in your case it doesn't. In fact when the first row is calculated N is 40000 which is even bigger than RAND_MAX! N decreases with each row but it's still way too big. In short you're skewing the probability distribution away from an assumed even distribution into something you don't quite know what it is really.

    But it's possible to improve the situation. Use this instead,

    Code:
    float d = float(rand()) / float(RAND_MAX+1); // d is in the 0.0 to 1.0 (non inclusive) range
    int v = int(d * float(N)); // v is in the 0 to N-1 range
    It will produce random numbers in the wanted range as before but in a proper way. Now the first row becomes,

    985, 2015, 1000

    and all other rows are also closer to the expected 1000, 2000, 1000 averages.

    Now the algorithm "works" in the sense that the random variation is a result of the natural variation you get from simulating probability fractions by applying the central limit theorem, and not from a fawlty and unpredictable random number generation. Still I suspect the algorithm is somewhat too complex and what it does can be achieved in a simpler more transparent way. Personally I would generate the partitions according to the normal (Gaussian) distribution instead. It would allow for a nicer more controlled random variation (since you cannot only determine the mean but also the deviation from the mean). Good luck!
    Last edited by nuzzle; September 3rd, 2012 at 12:19 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured