random partition in to k groups
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 7 of 7

Thread: random partition in to k groups

Hybrid View

  1. #1
    Join Date
    Sep 2010
    Posts
    39

    random partition in to k groups

    How do you generate a random permutation of numbers from 0 to N without storing it in an array ?
    Note that duplicates are to be avoided so I can't use rand() calls to generate them.
    The problem I have is to partition N number of data points into k groups in random selection manner.
    I thought of random shuffling of indices using random_shuffle of stl but the indices need to be stored in an array.
    Is there a way to get N integers from [0..n] sorted in a random order, using which I can access a database ?
    Thanks

  2. #2
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,273

    Re: random partition in to k groups

    Why can you not store the random number generated in an array or some container?
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  3. #3
    Join Date
    Sep 2010
    Posts
    39

    Re: random partition in to k groups

    N can be very big (thousands or millions). Maybe I am looking at this problem from a wrong perspective so let me describe the actual problem.
    I have data of wins,draws,losses N = W + D + L. For example 100000 wins, 40000 draws and 60000 losses = 200000 total.
    Now I want to partition this into say 10 groups, the first group could be f.i (20000w,10234d,34000l). So once I generate all the 10 groups like that they should add up to the total values.

    Edit: It seems what I need is to generate k random numbers that add up to W, another that add up to D etc. Maybe that will work?
    Last edited by dshawul; August 31st, 2012 at 09:32 AM.

  4. #4
    Join Date
    May 2009
    Posts
    2,413

    Re: random partition in to k groups

    Quote Originally Posted by dshawul View Post
    It seems what I need is to generate k random numbers that add up to W
    If you want k sections you generate k-1 numbers within the range representing the borders between sections. To that you add the two range limits giving k+1 numbers. Then you sort them. Finally to get the k sections you subtract each number from its next smaller neighbour (if one exists and maybe add 1).

  5. #5
    Join Date
    Sep 2010
    Posts
    39

    Re: random partition in to k groups

    Well it seems the easiest approach of picking one at a time randomly works well.
    Code:
    #include <stdlib.h> 
    #include <stdio.h> 
    
    void gen(int* r,int* rk,int Nk) { 
       int N = 0; 
       for(int j = 0;j < 3;j++) 
          N += r[j]; 
       for(int i = 0;i < Nk;i++) { 
          int v = rand() % N; 
          int total = 0; 
          for(int j = 0;j < 3;j++) { 
             total += r[j]; 
             if(total > v) { 
                rk[j]++; 
                r[j]--; 
                N--; 
                break; 
             } 
          } 
       } 
    } 
    int main() { 
       int r[3] = {10000,20000,10000}; 
       int Nk = 4000; 
       int rk[10][3]; 
       srand(1); 
       for(int i = 0;i < 10;i++) { 
          rk[i][0] = rk[i][1] = rk[i][2] = 0; 
          gen(r,rk[i],Nk); 
          printf("%d %d %d\n",rk[i][0],rk[i][1],rk[i][2]); 
       } 
       return 0; 
    }

  6. #6
    Join Date
    May 2009
    Posts
    2,413

    Re: random partition in to k groups

    Quote Originally Posted by dshawul View Post
    Well it seems the easiest approach of picking one at a time randomly works well.
    I've run your example and I think I better understand the problem now. Using the specific numbers from the example you want each of the 10 rows to sum up to 4000 and each of the 3 columns to sum up to 10000, 20000 and 10000 respectively.

    I haven't fully analyzed the algorithm but I take it you're making use of the central limit theorem to simulate the probability distribution for each row (which should tend towards 10000/40000, 20000/40000, 10000/40000 that is 1/4, 1/2, 1/4 probabilities). Thus with a row sum of 4000 the result should be close to 1000, 2000, 1000 but strangely the first row is,

    1140, 2319, 541

    This adds up to 4000 so it's correct in that way but the numbers deviate far too much from the expected 1000, 2000, 1000 averages so there's something fishy going on. It's the same with all 10 rows.

    And in fact you're overextending rand(). This simple generator produces random numbers between 0 and RAND_MAX (which usually is 32767). Then you're using this technique to generate random numbers in the 0 to N-1 range:

    Code:
    int v = rand() % N;
    It's based on that N should be fairly small in relation to what rand() produces to work properly. But in your case it doesn't. In fact when the first row is calculated N is 40000 which is even bigger than RAND_MAX! N decreases with each row but it's still way too big. In short you're skewing the probability distribution away from an assumed even distribution into something you don't quite know what it is really.

    But it's possible to improve the situation. Use this instead,

    Code:
    float d = float(rand()) / float(RAND_MAX+1); // d is in the 0.0 to 1.0 (non inclusive) range
    int v = int(d * float(N)); // v is in the 0 to N-1 range
    It will produce random numbers in the wanted range as before but in a proper way. Now the first row becomes,

    985, 2015, 1000

    and all other rows are also closer to the expected 1000, 2000, 1000 averages.

    Now the algorithm "works" in the sense that the random variation is a result of the natural variation you get from simulating probability fractions by applying the central limit theorem, and not from a fawlty and unpredictable random number generation. Still I suspect the algorithm is somewhat too complex and what it does can be achieved in a simpler more transparent way. Personally I would generate the partitions according to the normal (Gaussian) distribution instead. It would allow for a nicer more controlled random variation (since you cannot only determine the mean but also the deviation from the mean). Good luck!
    Last edited by nuzzle; September 3rd, 2012 at 12:19 AM.

  7. #7
    Join Date
    Sep 2010
    Posts
    39

    Re: random partition in to k groups

    Hi nuzzle
    Sorry I didn't look back to this thread once I thought I had something working. Yes you are right there was something fishy about the code I posted. Running the code 10 times will reveal a pattern that shows W decreasing from sample 1 to 10. At the time I thought it was becauseit was running out cycles when using rand() and replacing it with a better PRNG such as a mersenne twister actually gave a better result. But as you pointed out there is more to it and it may be equally good to use rand() when I fix that mistake.
    Anyway there is a better method than the brute force approach I followed. Using hyper-geometric sampling (without replacement) gives a much faster method. But it requires a complex function for that which I found from R package.
    cheers

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center