
September 23rd, 2013, 04:31 PM
#1
Question / Design approach for meeting distribution curve
Hi folks,
Newbie poster here. I'm working on a project in C#, and I'm curious about your ideas for how to approach something.
This application is going to generate a random list of X number of Things. Each Thing is of a certain Type. The Types need to follow a particular distribution curve. For example:
Type A 5%
Type B 10%
Type C 50%
Type D 35%
So, the idea is, if I choose to generate 100 things, I would get 5 of Type A, 10 of Type B, 50 of Type C, and 35 of Type D. This is a very simple example, of course. In reality there are many more types with various frequencies.
What approach would you take to something like this? The main approach I was taking is as follows:
1. Treat the frequencies as integer values
2. Sum the total of the frequencies
3. Generate a random number between 1 and the total.
4. Loop through the Things, increasing a counter by the frequency of that thing.
5. If the random number is in the current range, then select that thing.
Thoughts on this approach?
In this method, will the results sets over time approximate the ideal distribution? Probability was never my strong point. If I wanted a little MORE variability in the distribution, I wonder how I could introduce that?
Thoughts greatly appreciated.
Steve

September 24th, 2013, 03:18 AM
#2
Re: Question / Design approach for meeting distribution curve
Originally Posted by sbattisti
This application is going to generate a random list of X number of Things. Each Thing is of a certain Type.
If you want to generate a list where the items appear in random order there's an O(N) algorithm called Random Shuffle. It's available as a standard function in many languages but not in C# it seems so here's what appears to be a solid implementation,
http://www.dotnetperls.com/shuffle
In your example you add 5 A, 10 B, 50 C and 35 D to a list and shuffle it. To get a new random order you just reshuffle it.
This works fine when the number of items are reasonably small otherwise the list may be intractably large. Then it may be better to simulate a bucket of items and draw them one by one. On the other hand in that case if you draw a small number of items they will appear in random order but not with the exact wanted frequency. There will be a deviation that becomes smaller the more items you draw.
So there's a tradeoff situation. Which approach is better depends on the application.
Last edited by dazzle; September 24th, 2013 at 03:40 AM.
Posting Permissions
 You may not post new threads
 You may not post replies
 You may not post attachments
 You may not edit your posts

Forum Rules

Click Here to Expand Forum to Full Width
This a Codeguru.com survey!
