Question / Design approach for meeting distribution curve

• September 23rd, 2013, 03:31 PM
sbattisti
Question / Design approach for meeting distribution curve
Hi folks,

Newbie poster here. I'm working on a project in C#, and I'm curious about your ideas for how to approach something.

This application is going to generate a random list of X number of Things. Each Thing is of a certain Type. The Types need to follow a particular distribution curve. For example:

Type A 5%
Type B 10%
Type C 50%
Type D 35%

So, the idea is, if I choose to generate 100 things, I would get 5 of Type A, 10 of Type B, 50 of Type C, and 35 of Type D. This is a very simple example, of course. In reality there are many more types with various frequencies.

What approach would you take to something like this? The main approach I was taking is as follows:

1. Treat the frequencies as integer values
2. Sum the total of the frequencies
3. Generate a random number between 1 and the total.
4. Loop through the Things, increasing a counter by the frequency of that thing.
5. If the random number is in the current range, then select that thing.

Thoughts on this approach?

In this method, will the results sets over time approximate the ideal distribution? Probability was never my strong point. If I wanted a little MORE variability in the distribution, I wonder how I could introduce that?

Thoughts greatly appreciated.

Steve
• September 24th, 2013, 02:18 AM
dazzle
Re: Question / Design approach for meeting distribution curve
Quote:

Originally Posted by sbattisti
This application is going to generate a random list of X number of Things. Each Thing is of a certain Type.

If you want to generate a list where the items appear in random order there's an O(N) algorithm called Random Shuffle. It's available as a standard function in many languages but not in C# it seems so here's what appears to be a solid implementation,

http://www.dotnetperls.com/shuffle

In your example you add 5 A, 10 B, 50 C and 35 D to a list and shuffle it. To get a new random order you just reshuffle it.

This works fine when the number of items are reasonably small otherwise the list may be intractably large. Then it may be better to simulate a bucket of items and draw them one by one. On the other hand in that case if you draw a small number of items they will appear in random order but not with the exact wanted frequency. There will be a deviation that becomes smaller the more items you draw.

So there's a tradeoff situation. Which approach is better depends on the application.