Generating big random numbers in C

Printable View

February 15th, 2013, 08:34 PM
AwArEnEsS

Generating big random numbers in C

I want to generate big random numbers in C(not C++ please).By "big" I mean integers much bigger than srand(time(NULL)) and rand() functions' limit(32767).

I tried writing: (note:I am not able to see "code" tag button in this editor,so I am not using it)

//****

int randomnumber;
srand( time(NULL) );
randomnumber = (( rand() % 33 ) * ( rand() % 33 ) * ( rand() % 33) * ( rand() * 33) * (rand() % 33 )) + 1

//****

But I have doubts about it's randomness quality.Also there is another problem,the program can't know the maximum random number it should use before user input,so maximum random number may need to use much smaller maximum random number according to user input.

Is there a better algorithm to create quality big random numbers in C?
February 15th, 2013, 10:30 PM
Codeplug

Re: Generating big random numbers in C

My recommendation is to read this excellent article and come back here with any questions: http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx

gg
February 16th, 2013, 09:11 AM
D_Drmmr

Re: Generating big random numbers in C

Quote:

Originally Posted by AwArEnEsS

Is there a better algorithm to create quality big random numbers in C?

How about this? http://www.cs.hmc.edu/~geoff/mtwist.html
February 17th, 2013, 04:35 AM
nuzzle

Re: Generating big random numbers in C

Quote:

Originally Posted by AwArEnEsS

Is there a better algorithm to create quality big random numbers in C?

If you want better quality than rand() you must use a better random number generator as has been suggested. But if you're satisfied with the statistical properties of rand() it's easy to use rand() to generate bigger numbers with the same quality.

rand() generates uniformly distributed random numbers (each number appears with equal probability) and the bigger numbers should follow the same distribution. You're on the right track by using rand() repeatedly but unfortunately simply multiplying (or adding) the numbers skews the distribution so it's no longer will be uniform. Do this instead,

Code:

int random = rand() & 255; random = random<<8 | (rand() & 255); random = random<<8 | (rand() & 255); random = random<<7 | (rand() & 127); // or crammed int random = (((rand() & 255)<<8 | (rand() & 255))<<8 | (rand() & 255))<<7 | (rand() & 127);

This assumes ints are 32 bits and that rand() generates numbers not smaller than 8 bits (a byte). All in all 31 bits are extracted from four rand() calls and concatenated to form an int. That will give you a random uniformly distributed number withing the full range of a positive 32 bit int.

To change the size you just use the modulo operator. Say you want to limit the random number to be somewhere from 0 up to and including N you add this after the code above,

Code:

random = random % (N+1);
February 19th, 2013, 09:28 PM
AwArEnEsS

Re: Generating big random numbers in C

I was a bit busy and didn't work on programming in last days.

Thank you all for your help,I will try them.
February 20th, 2013, 08:08 AM
OReubens

Re: Generating big random numbers in C

the problem with using rand() (or any other PRNG function) to generate bigger random numbers than it can really handle out of it's own.

- You will probably reduce the periodicity of the PRNG. So instead of repeating the RNG pattern over X, it will repeat over X/2, X/4, X/8....
- The resultant numbers will probably not be properly distributed PRNG's, it is somewhat unlikely that you will end up generating EVERY possible value in the range, you will likely end up generating numbers where some values cannot ever occur. Or where some numbers will occur significantly more frequent than others.

These issues may or may not be a problem for you.

So it depends what you really need out of this, but the only "true" way is tailoring an existing PRNG function towards the number of bits you need.
February 20th, 2013, 08:16 AM
Codeplug

Re: Generating big random numbers in C

>> That will give you a random uniformly distributed number withing the full range of a positive 32 bit int.
That doesn't seem correct to me. For example, to get a number between 0 and 127, rand() would have to return 0 twice (3 times to get "0"). If you see bits being thrown away, or RAND_MAX not being considered, then that should raise a red flag for distribution.

gg
February 20th, 2013, 12:17 PM
nuzzle

Re: Generating big random numbers in C

Quote:

Originally Posted by Codeplug

That doesn't seem correct to me. For example, to get a number between 0 and 127, rand() would have to return 0 twice (3 times to get "0"). If you see bits being thrown away, or RAND_MAX not being considered, then that should raise a red flag for distribution.

I don't quite understand the nature of your objection.

I've assumed that 8 uniformly distributed bits can be extracted from each rand() call using a modulo 256 operation. And they approximately will if the random numbers delivered by rand() are drawn from a range reasonably larger than 0-255.

It doesn't matter how high quality the random number is (say it's generated with a Marsenne twister). If a modulo operation is used to limit it a non-uniformity will sneak into the bit distribution regardless.

I didn't say using rand() to generate big random numbers is perfect but it's simple and the quality will be fairly good. If rand() is good enougth in the first place it's good enought for bigger random numbers.
February 20th, 2013, 06:15 PM
Codeplug

Re: Generating big random numbers in C

>> I don't quite understand the nature of your objection.
>> ... uniformly distributed number withing the full range of a positive 32 bit int.
My objection is that it's not a uniform distribution - ie. there is not an equal probability of seeing every value in the range [0, 2147483647). Or even [0, N], when you do "random % (N+1)".

BTW, I'm assuming you meant for the "rand() & 127" byte to be the most-significant byte instead of the least. And the shifting in the "// or crammed" part is wrong.

But yeah, it may be "good enough".

gg
February 21st, 2013, 04:33 AM
nuzzle

Re: Generating big random numbers in C

Quote:

Originally Posted by Codeplug

>> I don't quite understand the nature of your objection.
>> ... uniformly distributed number withing the full range of a positive 32 bit int.
My objection is that it's not a uniform distribution - ie. there is not an equal probability of seeing every value in the range [0, 2147483647). Or even [0, N], when you do "random % (N+1)".

Well, as I said it doesn't matter how high quality the random integer. If you use modulo N to "cap" it, a small non-uniformity will sneak in. But the severity will be limited as long as N is small in relation to the total range of the random numbers. And the non-uniformity is not anyway near what you will get by multiplying (or adding) random numbers (like the OP did initially). That will totally destroy a uniform distribution rendering the big random numbers useless. The modulo bias problem is small in comparision.

But to be sure the OP could check RAND_MAX and only use say half of the random bits rand() delivers to create the bigger random number. The lowest value of RAND_MAX I've ever seen is 32767. This corresponds to 15 bits so using 8 bits like I suggested is likely to be fine. But putting in an assert to check it is never wrong of course.

Quote:

BTW, I'm assuming you meant for the "rand() & 127" byte to be the most-significant byte instead of the least. And the shifting in the "// or crammed" part is wrong.

I mean to extract the 7 rightmost bits. That will give a total of 8+8+8+7 = 31 random bits and compose a random positive 32 bit int.

How is the "crammed" version wrong?

Quote:

But yeah, it may be "good enough".

That's right. Optimally you use a random number generator that delivers a random integer within the wanted range. But just lifting some algorithm from the net is risky too. It may not be as good as the author believs or claims and it may have bugs. At least rand() is time-tested.

It's not that hard to fix the modulo bias problem. This is how it's done in Java. See the nextInt(int n) method of the Random class,

http://docs.oracle.com/javase/7/docs/api/

But Java aside, in my view if the C application isn't important enough to warrant a commersial generator AND to employ an expert to review the statistics then using rand() the way I suggested definately is "good enought". And regardless of how much effort is put in, random numbers generated on digital computers will always be "pseudo" anyway.
February 21st, 2013, 03:39 PM
Codeplug

Re: Generating big random numbers in C

>> If you use modulo N to "cap" it, a small non-uniformity will sneak in.
This statement is only true under the following assumptions:
- the PRNG returns uniformly distributed numbers within some range, "[0, RAND_MAX]" for rand()
- N is sufficiently smaller than RAND_MAX (you covered this one...)
- for smaller N's, the PRNG should have good randomness in the low-order bits
- the range is a multiple of N, "(RAND_MAX+1) % N == 0"
http://c-faq.com/lib/randrange.html

>> ... so using 8 bits like I suggested is likely to be fine.
Maybe, maybe not. It's not hard to eliminate the "maybe" by utilizing all the bits from rand().

>> [use rand] to create the bigger random number.
Right, rand() can be used to generate a random string of bits of any length. My original objection was the resulting integers taken from those bits are not uniformly distributed. Only the string of 0's and 1's are uniformly distributed, assuming rand() was used properly to generate the string of bits.

>> ... and compose a random positive 32 bit int
My train of thought was that "positive" implies that the most significant bit is always 0. Masking with 127/x7F ensures that the most significant bit in the resulting byte is always 0. So that byte needs to be the most significant byte in the signed integer to ensure the result is positive.

Code:

int random = rand() & 255; random = random<<8 | (rand() & 255); random = random<<8 | (rand() & 255); random = random<<7 | (rand() & 127);

Here, the most significant bit will always be 0 because there are only 31 bit shifts - so a 1 will never be shifted into that position. So the 127 mask is an error since it causes the 8th bit to always be 0 - not random at all :)

>> How is the "crammed" version wrong?
Only the first 2 bytes of the int are ever set. You overlooked the "<<=" nature of the original code I think.

Here's how I would do "random bits in an int"

Code:

double uniform_deviate(int seed) { return seed * (1.0 / (RAND_MAX + 1.0)); } int rand_byte() { return (int)(uniform_deviate(rand()) * 255 + 0.5); // wrong! - just use 256, see post #19 } int r = rand_byte() << 24 | rand_byte() << 16 | rand_byte() << 8 | rand_byte(); // throw away msb if you want a positive signed int r &= ~(1 << 31);

gg
February 22nd, 2013, 03:06 AM
nuzzle

Re: Generating big random numbers in C

Quote:

Originally Posted by Codeplug

>> [b]I
It's not hard to eliminate the "maybe" by utilizing all the bits from rand().

That's not enougth. You can only assume all bits are equally likely if RAND_MAX+1 is a power of two. Maybe it is or maybe it isn't. So even if the random number you get from rand() is uniformly distributed the individual bits may not be for certain. They could be afflicted with a modulo bias.

Quote:

My train of thought was that "positive" implies that the most significant bit is always 0. Masking with 127/x7F ensures that the most significant bit in the resulting byte is always 0. So that byte needs to be the most significant byte in the signed integer to ensure the result is positive.

Maybe there's some major oversight on my part but I just don't get what you mean. If I do

int y;
int x = x<<7 | (y & 127);

then x is shifted 7 bits to the left and its 7 rightmost bits are replaced with the 7 rightmost bits of y?

The "crammed" version of my formula is, due to the parentheses, evaluated Horner style. The 8 bits extracted from the leftmost rand() call will be shifted 23 bits to the left and just about miss the signbit of a 32 bit integer. The signbit will be zero resulting in a positive random integer.

Quote:

Here's how I would do "random bits in an int"

Well, you're assuming RAND_MAX is no less than 8 bits aren't you? That's what I did too.

How do the calculations you perform on rand() remove a potential modulo bias? From what I can see you have an implicit modulo 256 operation hidden in the conversion from double back to int. In principle you could as well perform it on the rand() number directly as I did skipping the foggy detour over double. I don't see what your method improves over my suggestion really.

I avoided the "masking the signbit" approach (to turning a signed random number into a positive) because it's not generally applicable. Say you generate random numbers in the {-1, 0, 1} range and each number has a 1/3 chance of coming up. If you "mask the signbit" you will get numbers in the {0, 1} range indeed, it's just that 1 will come up with 2/3 probability and 0 with 1/3. The distribution got skewed.

Finally, after the 32 bit positive random int is generated there's still the problem of limiting it to a range. If modulo N is used we're back where we started with the modulo bias problem.
February 22nd, 2013, 11:38 AM
Codeplug

Re: Generating big random numbers in C

>> So even if the random number you get from rand() is uniformly distributed the individual bits may not be for certain.
I'm no mathematician, and I couldn't find an authoritative answer online. My rand_byte() does return a uniformly distributed number from [0,255]. If those bytes are concatenated, are the resulting bits in that bit string also uniformly distributed? I assumed yes at first, but now I'm not sure.

>> ... due to the parentheses ...
Doh - missed the paren grouping completely. My bad.

>> Well, you're assuming RAND_MAX is no less than 8 bits aren't you?
No. The code generates a uniform distribution from [0,255] regardless of the value of RAND_MAX - making it portable.

>> I don't see what your method improves over my suggestion really.
The method I've shown preserves the uniform distribution (UD) of rand by mapping the [0, RAND_MAX] int UD, to a [0, 1) double UD. Your code simply takes the low order bits and throws the rest away - which destroys the UD.

The technique is described here: http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx (from post #2)

>> Finally, after the 32 bit positive random int is generated there's still the problem of limiting it to a range. If modulo N is used we're back where we started with the modulo bias problem.
There is no uniform distribution in either of our methods for generating the 31 bit int. The "modulo bias problem" is a problem of breaking the distribution. No distribution, then no problem.

gg
February 22nd, 2013, 05:53 PM
Codeplug

Re: Generating big random numbers in C

>> If those bytes are concatenated, are the resulting bits in that bit string also uniformly distributed?
Got an answer from sci.stat.math - https://groups.google.com/forum/#!to..._-E/discussion

Quote:

If the integers are uniform on [0, 2^n - 1], where n is a positive integer, then YES.

That makes sense.

I should also point out that even the uniform_deviate() method isn't *perfectly uniform* in all cases:
https://groups.google.com/d/msg/comp...A/m7CckPudJb8J

For reference, here's the "recommended" code for mapping the [0, RAND_MAX] uniform distribution to a [0, N) distribution that maintains *perfect* uniformity (where N <= RAND_MAX - so it doesn't help the Op, but it's good to know):
https://groups.google.com/d/msg/comp...A/WevBqpEv7QEJ

gg
February 23rd, 2013, 03:13 AM
nuzzle

Re: Generating big random numbers in C

Quote:

Originally Posted by Codeplug

The code generates a uniform distribution from [0,255] regardless of the value of RAND_MAX - making it portable.

No it doesn't. Lets assume for the sake of discussion that rand() only generates one random bit in each call. This bit cannot be used to generate 256 random numbers. In order to do that rand() must produce no less than 8 random bits. That is RAND_MAX must not be smaller than 255.

Quote:

The method I've shown preserves the uniform distribution (UD) of rand by mapping the [0, RAND_MAX] int UD, to a [0, 1) double UD. Your code simply takes the low order bits and throws the rest away - which destroys the UD.

That's not correct. As I said your method makes a detour over double but it still involves an implicit modulo 256 operation. When it finally arrives at the 8 bit integer the random qualities hasn't improved on what it started with. The quality is no better than just using 8 random bits directly from the rand() call as I did. The potential modulo bias is still there.

---

In both these case you're making a common mistake. You're hoping that some hokus pokus and black magic performed on a random number will improve its "randomness". It generally doesn't. To achieve that you need to take the number into a chaotic domain (as random number generators do) or generate additional random numbers that can then be used to improve on the first one. An example of the latter is shown in the Java link I supplied where the modulo bias is reduced in nextInt(int n) of Random.
February 23rd, 2013, 12:14 PM
Codeplug

Re: Generating big random numbers in C

>> The potential modulo bias is still there.
*sigh* I can't explain things any better than the links I've already posted. If you don't believe the experts from the Usenet posts I linked to, then there's no point in continuing :(

gg
February 23rd, 2013, 12:50 PM
nuzzle

Re: Generating big random numbers in C

Quote:

Originally Posted by Codeplug

>> The potential modulo bias is still there.
*sigh* I can't explain things any better than the links I've already posted. If you don't believe the experts from the Usenet posts I linked to, then there's no point in continuing :(

Well, all links you've supplied only confirm what I'm telling you. The method you've posted doesn't improve on using the 8 rightmost bits of rand() directly. Your detour over double doesn't change anything. Any real solution to this problem involves generating more randomness (like in that Java example I supplied). I just wanted to make that perfectly clear.

Your method converts the MAX_RAND+1 integers to doubles in the 0 to 1 range. This is not really a continuous distribution. It's still MAX_RAND+1 distinct numbers, only now doubles and scaled. When you cast/convert back to the 8 bit integer range you're really performing an implicit modulo 256 operation. It's equivalent to performing the same operation on a rand() integer directly.

If the highest quality is needed the OP should invest in a commersial random number solution and hire an expert to review the statistics part of the application. If that's considered overkill my suggestion is good enougth (and the hokus pokus of your method won't improve on it).
February 23rd, 2013, 09:17 PM
Codeplug

Re: Generating big random numbers in C

http://www.cse.unr.edu/robotics/bekr...orm_random.pdf

gg
February 25th, 2013, 06:03 AM
OReubens

Re: Generating big random numbers in C

Quote:

Originally Posted by Codeplug;2106595I'm no mathematician, and I couldn't find an authoritative answer online. My rand_byte() does return a uniformly distributed number from [0,255

.

it doesn't. it has an under-bias for 0. Because of the +0.5 correction, 0 will appear about half as much as any other number.
there is also an under-bias for 255, because you shifted that half of 0 into that 255. again 255 occurs about half as much as any other number.

using your code, to properly return a 0-255 distribution the rand_byte should be written as

Code:

int rand_byte() { return (int)(uniform_deviate(rand()) * 256); }

besides, if RAND_MAX is a power of 2 minus 1 (which about 99% of the PRNG's are) you don't "need" the inefficient detour to floating point number.
Solve things in fixed point integer math by assuming the PRNG is a fixed point valus with a mantisssa of N-bits (where RAND_MAX = (2^N)-1, in the range 0 to 1 exclusive. [0, 1[

Code:

// returns a number in the range 0 to max-1 short rng(short max) { return (short( ((long)rng*max) >> RAND_BITS ); // RAND_BITS is 15 for c's rand() }

a version that returns a byte is easy enough to work out from that.

I'm finding it strange that so few sources on RNG's explain this fixed point approach and most go for the inefficient floating point solution.

Quote:

If those bytes are concatenated, are the resulting bits in that bit string also uniformly distributed? I assumed yes at first, but now I'm not sure.

"probably not"
this will get worse as you extract more bits at a time to chunk into the bigger one.
fewer bits at a time will help, but has the adverse effect of lowering the periodicity of the resulting PRNG even more.

edit: Yes, I realise that by definition of "random", it should, but we're dealing with PSEUDO random sequences here. uniform distribution only counts on a large enough sequence of numbers, Most PRNG's have localised "hotspots" or "artifacts", and when combining those hotspots into a single larger random number, you made the hotspot/artefact worse. This is also why some PRNG's aren't suitable for cryptology or why some aren't good for a monte carlo simulation.

So it "might" work, or might be "good enough" depending on the input prng (rand() is just awful), but making blanket statements of the resulting larger number PRNG won't work well.

Quote:

No. The code generates a uniform distribution from [0,255] regardless of the value of RAND_MAX - making it portable.

Incorrect. If RAND_MAX is less than 8 bits, you cannot ever produce a uniform distribution that has 2^8 unique values. This is elementary math. The "detour" to a double changes nothing in this, the double will at most evaluate to RAND_MAX unique values.

If you COULD... then the whole problem the OP asks would be pointless, you could just use rand() and assume a new RAND_MAX of whatever size the OP needs to get his bigger range. it works value wise, it's just not uniform. some values will occor multiple times, some will never occur.

So your code can only work if you want to extract fewer bits at a time than RAND_MAX can hold.

Quote:

>> I don't see what your method improves over my suggestion really.
The method I've shown preserves the uniform distribution (UD) of rand by mapping the [0, RAND_MAX] int UD, to a [0, 1) double UD. Your code simply takes the low order bits and throws the rest away - which destroys the UD.

Wrong, see above.

Your technique does have the merit of using the HIGH bits rather than the LOW bits, which is better. The high bits are more likely to produce a shortened uniform distribution than the lower bits. Explaining this is beyond the scope of this forum. The short version of the story is that while a PRNG is uniform over a large number of values, there tend to be localized "hotspots" over short ranges.

ignoring the "low bits" vs "high bits" issue, which you can work around. both modulo and your apprpoach have the same problem, just localized differently. And it manifests itself if N is not a power of 2.

Suppose you had a rand() that returned numbers in the 0-7 range (3 bits). And you wanted a random number in the range 0-2 inclusive
With modulo
0 will result for rand() values 0, 3, 6
1 will result for rand() values 1, 4, 7
2 will result for rand() values 2, 5
The modulo 'clips' part of the rand() range and puts it all in the lower result values. Some values have a 1 higher probability than others.

your solution will have the same problem (it will bias some resulting values by 1 more than others), but instead of biassing the lower range results, the bias will be spread out over the entire range.

Either of the 2 systems could have advantages depending on what you use the rand() for. One method is not 'better' than the other.
February 25th, 2013, 12:42 PM
Codeplug

Re: Generating big random numbers in C

>> Because of the +0.5 correction ...
You are correct. Simply multiplying by 256 is the right thing to do and will give a true uniform distribution over [0, 255]. It was easy to confirm as well:

Code:

size_t freq[256] = {0}; for (;;) if (freq[rand_byte()]++ == RAND_MAX) break; for (int n = 0; n < 256; ++n) printf("%u\n", freq[n]);

>> I'm finding it strange that so few sources on RNG's explain this fixed point approach and most go for the inefficient floating point solution
I'm guessing because it's non-portable since it requires RAND_MAX=(2^N)-1, which isn't mandated by the standard (although I haven't seen an implementation where that isn't the case).

>>> If those bytes are concatenated, are the resulting bits in that bit string also uniformly distributed? I assumed yes at first, but now I'm not sure.
>> "probably not"
The answer is "yes" if you have a uniform distribution on the range [0, R] where R=(2^N)-1. Since the bytes are uniformly distributed on [0, 255] (not using the bad +0.5 code) then the resulting bits in the resulting bit string are also uniformly distributed. Note that I'm not making any claims about the resulting integer when concatenating only 4 of those bytes.

I asked for help on this one here: https://groups.google.com/forum/#!to..._-E/discussion

>>> - making it portable.
>> Incorrect. If RAND_MAX is less than 8 bits ...
The "If" is irrelevant thanks to the standard:

Quote:

Originally Posted by ISO/IEC 9899:1999 (E)

7.20.2.1 - 5
The value of the RAND_MAX macro shall be at least 32767.

>> Wrong, see above. Your technique ...
>> both ... have the same problem... And it manifests itself if N is not a power of 2.
Are you saying that for a uniformly distributed integer range of [0, R], you can only map that to a [0, 1) floating-point range uniformly iff R=(2^N)-1? That doesn't make sense to me. The limiting factor seems to be the number of bits used to store the mantissa (assuming R>=0x7fff).

Or are you talking about applying the uniform_deviate() technique to generate uniform numbers greater than R? If so, I've never claimed that - and I agree that doesn't work.

>> One method is not 'better' than the other.
I'm not sure what's being compared... I've only compared random byte generation. I claim that the (fixed) rand_byte() is superior because not only are the values uniformly distributed, but the bits themselves are too.

gg
February 26th, 2013, 09:09 AM
OReubens

Re: Generating big random numbers in C

Quote:

Originally Posted by Codeplug

>> I'm finding it strange that so few sources on RNG's explain this fixed point approach and most go for the inefficient floating point solution
I'm guessing because it's non-portable since it requires RAND_MAX=(2^N)-1, which isn't mandated by the standard (although I haven't seen an implementation where that isn't the case).

Well I wasn't referring to rand() specifically which is indeed not guaranteed to be a power of 2 minus 1 (although i've never seen a different one in any of the many compilers I used). Which is also why there is a RAND_MAX defined and not a RAND_BITS.
But even in other resources dealing with other PRNG's that can only work on (2^n)-1 type ranges, they use the detour to a double. I've even seen a case where they made a 144bit PRNG, then had a whole custom library included to handle floating point values with a 144bit mantissa, only to extract integer random values.
:rolleyes:

Quote:

>>> If those bytes are concatenated, are the resulting bits in that bit string also uniformly distributed? I assumed yes at first, but now I'm not sure.
>> "probably not"
The answer is "yes" if you have a uniform distribution on the range [0, R] where R=(2^N)-1. Since the bytes are uniformly distributed on [0, 255] (not using the bad +0.5 code) then the resulting bits in the resulting bit string are also uniformly distributed. Note that I'm not making any claims about the resulting integer when concatenating only 4 of those bytes.

Like I said... "in theory" if your RNG is perfectly uniformly distributed. Then Yes, that's part of the definition of things after all.
In practice, you deal with PSEUDO RNG's, and things aren't "perfectly uniformly distributed" there. Over a large enough series of random numbers, yes, but in short sequences, you tend to find this may not be the case. Most PRNG's have some form of hotspots or artefacts that make them not "perfectly" uniformly distributed.

rand() which is typically implemented as a linear congruential generator is notoriously bad for this, it causes an artefact effect called "hyperplanes" which is caused by the high amount of serial correlation in how it works.

I would certainly not expect rand() bits packed together to form a bigger number to end up being a uniform distribution.

Quote:

>> Wrong, see above. Your technique ...
>> both ... have the same problem... And it manifests itself if N is not a power of 2.
Are you saying that for a uniformly distributed integer range of [0, R], you can only map that to a [0, 1) floating-point range uniformly iff R=(2^N)-1? That doesn't make sense to me. The limiting factor seems to be the number of bits used to store the mantissa (assuming R>=0x7fff).

Or are you talking about applying the uniform_deviate() technique to generate uniform numbers greater than R? If so, I've never claimed that - and I agree that doesn't work.

Not quite. I may have messed things up by using 'N' again. though :p

What I meant is if you have a random distribution from 0-R exclusive, then neither your method of multiplying and clipping nor modulo will give a uniform distribution over 0-M if M is not a factor of R (R % M != 0)

some numbers will have a probablity of R/M while some will have a probability of (R/M)+1.
Again, this is elementary math.
The methods are different though, with modulo all the lower R%M numbers will have the +1
in your method, the numbers with 1 more will be spread out.

You can correct for this deficiency by managing an error factor, but it's rarely applied because most apps don't need that extra uniformity.

Quote:

> One method is not 'better' than the other.
I'm not sure what's being compared... I've only compared random byte generation. I claim that the (fixed) rand_byte() is superior because not only are the values uniformly distributed, but the bits themselves are too.

I was comparing one method of reducing range to less than RAND_MAX+1 (and somethign different than a factor of RAND_MAX+1). == see the bit of text before this. Multiplication and clipping vs modulo.

Other than that. your rand_byte (by extracting high bits through multiplication) is not technically better than using a %256. Given the nature of things however, you get the benefit of using the high bits, which is superior because of how PRNG artefacts occur. Even with an M different than a factor of RAND_MAX+1, both are technically equally good, but different (= the +1 all in the front numbers or spread out).

Your claim also has no proof, "superior because values are uniformly distributed". How is your method more uniformly distributed than the modulo approach. You seem to be thinking that because of the transition to a floating point that this increases extra uniformity, it doesn't. you will have exactly RAND_MAX unique possible values in your double. Just as much as rand() itself returns. moduloing rand() or multiply+clipping rand()/(RAND_MAX+1) does nor prefer one over the other (assuming rand() would be perfectly distributed without artefacts, which it isn't).

From a pure technical/efficiency POV. The modulo approach is inferior because division is less efficient than multiplication+clip (or shift).
But your current implementation with using a double of course negates that entirely.

It wouldn't surprise me at all btw, that your new corrected rand_byte() returns exactly the highest 8 significant bits of rand() making it equivalent to

Code:

int rand_byte() { return rand()>>7; }
February 26th, 2013, 11:19 AM
Codeplug

Re: Generating big random numbers in C

>> What I meant is if you have a random distribution from 0-R exclusive, then neither your method of multiplying and clipping nor modulo will give a uniform distribution over 0-M if M is not a factor of R (R % M != 0)
I don't understand why it being an even "factor" has anything to do with it since we're dealing with floating-point math. What is this "multiplying and clipping" that occurs in floating point math? Are the bits used for the mantissa not sufficient for dividing (M / (RAND_MAX + 1.0))? Are you saying that imperfect representations of IEEE 754 are causing some problem?

>> You seem to be thinking that because of the transition to a floating point that this increases extra uniformity
No, nothing "extra". Only that whatever uniformity that rand() provides will be preserved.

>> you will have exactly RAND_MAX unique possible values in your double.
I agree, and that's the whole point. The resulting floating-point range of values has the same distribution as rand(). The distribution is not lost completely.

>> moduloing rand() or multiply+clipping rand()/(RAND_MAX+1) does nor prefer one over the other
You're really confusing me, as the math is quite simple - (M / (RAND_MAX + 1.0)) - divide M by RAND_MAX+1 "buckets" and store the result into a double. So we're just at the mercy of double's precision and any IEEE error that's propagated with further math on the result. Still in my mind, one is obviously better than the other - one method preserves rand()'s distribution - the other does not.

>> (assuming rand() would be perfectly distributed without artefacts, which it isn't).
Artifacts are besides the point. We have a range and distribution described as uniform. What math is required to preserve that distribution? What math would destroy that distribution? If your formula doesn't include RAND_MAX, then you're not working with the distribution, you're working against it.

>> your rand_byte (by extracting high bits through multiplication) is not technically better than using a %256.
Still confused. You've implied more than once now that I'm somehow not using all the bits. How does floating-point multiplication/division not use all the bits?

>> It wouldn't surprise me at all btw, that ...
I'm not surprised, given it's a round trip through two ranges R where R=(2^N)-1.

What I want to work out, is if R isn't a nice power of 2 minus 1 - then would the uniform_deviate() method still preserve the distribution. I say "yes", to the extent that double's representation allows.

gg
February 26th, 2013, 12:49 PM
superbonzo

Re: Generating big random numbers in C

Quote:

Originally Posted by Codeplug

I don't understand why it being an even "factor" has anything to do with it since we're dealing with floating-point math.

in general, given a map f:A->B between two finite sets A and B, f sends a uniform probability distribution on A to a uniform probability distribution on B if and only if all preimages f^-1(b) with b in B have the same number of elements. Which implies that N divides M, where M is the number of elements in A and N is the number of elements in B.

Now, take your map sending the int rand() range [0,RAND_MAX] to the floating point range and then again to the output int range [0,N]. So, whatever the double detour result is, you'll always end up with a non uniform distribution if N does not divide RAND_MAX+1. The only thing you "gain" is an apparent spread out of the non-uniformity.
February 26th, 2013, 04:12 PM
Codeplug

Re: Generating big random numbers in C

>> ... if and only if ...
That's Greek to me. Does it have a name? Or where can I read about it more?
Proof jargon aside, isn't "uniformity" just a subjective measurement in this case, including rand()'s distribution?

>> The only thing you "gain" is an apparent spread out of the non-uniformity.
The math is obvious attempting to preserve the distribution as best it can. Is there a better way to attempt to preserve the distribution?

Here's a test of mapping [0, RAND_MAX] to [0, 300]:

Code:

// according to experts, this preserves the distribution of rand "perfectly" int nrand(int n) { const int bucket_size = RAND_MAX / n; int r; do r = rand() / bucket_size; while (r >= n); return r; } int rand_300_a() { return (int)(uniform_deviate(rand()) * 300); } int rand_300_b() { return nrand(300); } void rand_max_sample() { FILE *f = fopen("rand_max.txt", "w"); if (!f) return; size_t n; for (n = 0; n < RAND_MAX; ++n) fprintf(f, "%u\n", rand()); fclose(f); } void rand_300_a_sample() { FILE *f = fopen("300a.txt", "w"); if (!f) return; size_t n; for (n = 0; n < RAND_MAX; ++n) fprintf(f, "%u\n", rand_300_a()); fclose(f); } void rand_300_b_sample() { FILE *f = fopen("300b.txt", "w"); if (!f) return; size_t n; for (n = 0; n < RAND_MAX; ++n) fprintf(f, "%u\n", rand_300_b()); fclose(f); } int main() { const int seed = 12345; srand(seed); rand_max_sample(); srand(seed); rand_300_a_sample(); srand(seed); rand_300_b_sample(); return 0; }

Calculating the skewness of each set gives: 0.00218510057226509, 0.00220362015632128, and 0.00273418316416069 respectively.
The kurtosis of each set is: -1.19204957957879, -1.19206215018176, and -1.19290751126681 respectively.
So by those measures of "uniform'ness", my subjective response is that the distribution has been preserved.

Is there some other mapping other than 300 that will show the breakdown in uniformity?

Or perhaps the best question to ask at this point is: What does the C code look like that best preserves the distribution of rand() when mapping it to a smaller range? Does that code look any different than what's already been shown?

gg
February 26th, 2013, 06:41 PM
superbonzo

Re: Generating big random numbers in C

Quote:

Originally Posted by Codeplug

Proof jargon aside, isn't "uniformity" just a subjective measurement in this case, including rand()'s distribution?

no, technicalities aside, for finite sets a uniform probability distribution is just that assigning a constant probability to each outcome: 1/2 for each face of a coin, 1/6 for each face of a die, etc ... . Of course, this has little to do with assessing the actual quality of a pnrg.

Quote:

Originally Posted by Codeplug

That's Greek to me

non-technically speaking, let's say you have a source of "random" integer numbers following a uniform probability distribution in the range [0,N), hence the probability of each number 0<= i <N is 1/N. Now, consider a fixed function F(i) from [0,N) to another range of integers [0,M). Now you have a new source of random numbers with values in the range [0,M) by applying F to the original source. The probability that the new source emits a number 0<= j <M equals the probability that the original source emits an integer i such as j = F(i), that is equals (1/N)*("the number of integers i such as j = F(i)"). Therefore, if the new source is uniformly distributed ( = the probability that it emits a j is constant and equals 1/M ) then M must divide N. If not, the resulting probability distribution will not be uniform, that is some integers in [0,M) will have higher probability than others, whatever the function F is ( in your case, the mapping int->double->int ).

Quote:

Originally Posted by Codeplug

Here's a test of mapping [0, RAND_MAX] to [0, 300]

for the biased generator, the probabilities of each outcome will oscillates in the order of 1/300 +- 1/RAND_MAX, hence you'll need to collect ~ 10^6/10^7 samples for some pair of fixed biased outcomes to start observing the difference in the frequencies directly. So, an hardly noticeable effect anyway.

Quote:

Originally Posted by Codeplug

Is there some other mapping other than 300 that will show the breakdown in uniformity?

the simplest example is to map the range [1,3] to the range [1,2]. The only possible probability distributions on [1,2] resulting by a mapping from a uniformly distributed probability on [1,3] are { (1,0), (0,1), (2/3,1/3), (1/3,2/3) }; as you can see, none of them is uniform.

Quote:

Originally Posted by Codeplug

Or perhaps the best question to ask at this point is: What does the C code look like that best preserves the distribution of rand() when mapping it to a smaller range? Does that code look any different than what's already been shown?

what do you mean by "preserves the distribution of rand()" ? regarding a generic range random engine implementation you can take a look at boost.random or the new std <random>.
March 1st, 2013, 08:53 AM
OReubens

Re: Generating big random numbers in C

Quote:

Originally Posted by Codeplug

I don't understand why it being an even "factor" has anything to do with it since we're dealing with floating-point math. What is this "multiplying and clipping" that occurs in floating point math? Are the bits used for the mantissa not sufficient for dividing (M / (RAND_MAX + 1.0))? Are you saying that imperfect representations of IEEE 754 are causing some problem?

No, i'm saying that putting it in a double does not make it 'more uniform' than the integer.
rand() returns RAND_MAX unique integers. The returned integers are 'assumed' to be a uniform distribution (but it's an LCG, so it isn't really perfect in that).

double d = rand() / RAND_MAX;

will give you... RAND_MAX unique double values (this is basic math stuff here). Stuffing them in a double does not magically introduce MORE randomness, or more uniformity, nor does it change from a finite (RAND_MAX) set of numbers to something other than an finite set.
Calculating on a finite set can only reduce the set, it can never increase the set.

if I flip a coin, I can only get heads or tails with a presumed 50% chance of each.
If I stuff this value in a double, that double will have 2 possible values, the "heads" value, or the "tails" value, depending on how I made my calculation from heads/tails towards a double. This double will not magically start holding every possible double value from minus infinity to plus infinity.

So if you agree up to here... Then what your code is doing (multiplying by 256 and clipping) is favouring the "high" bits of rand(), while modulo 256 favours the low bits of rand()

Now how does this factoring go in here ? Again let me illustrate with a simple example. Suppose you had a regular unloaded 6 sided die (a perfect cube), but numbered 0-5 instead of 1-6. you can roll any value from 0 to 5 with an equal probability.
Now suppose I only want to have values 0 to 3... how do I "solve this" for the 6 values I can roll.

Your solution
(die_roll() / 6) * 4 and clip.
roll 0 -> (0/6)*4 = 0
roll 1 -> (1/6)*4 = 0
roll 2 -> (2/6)*4 = 1
roll 3 -> (3/6)*4 = 2
roll 4 -> (4/6)*4 = 2
roll 5 -> (5/6)*4 = 3

Oops... your perfect 0-5 distribution when changing to another range suddenly isn't so uniform anymore. As demonstrated, some values occur 1 times more than others, and these values are 'spread out' over the 0-3 range

Repeat the same for modulo, and the same thing happens, some values occur 1 more than others, but now all the +1 values will be up front.

Both methods are valid for reducing the 0-6 range to 0-3 depending on the needs, either mathod has advantages and disadvantages...

Now. if you were wanting a range that is a factor of 6 (say I want 2, or 3 values), then you end up with a uniform distribution over the smaller range.

is the above a problem. again... "it depends"... As the range you're interested in gets closer to RAND_MAX, this +1 effect gets worse and worse.
Suppose I was interested in a range of 0-32700. rand() has enough bits, and both multiply+clip and modulo will return a range.
but 68 values will appear twice as much as the other 32700. which ones, depends on the method use.

The above is one of the main reasons why you want "large bits" PRNG's. Not so much because you want big random numbers, but because you want to reduce the +1 effect of wanting a range that isn't a factor of whatever your RAND_MAX is.

you want R%M / R to be as small as you can. you achieve that by either getting a perfect factor (in which case this returns 0) or you want M to be as small as possible (not feasible since that's what you want to get afterall), so the only other way out is making R as large as possible (big number PRNG's are always better, even if you don't need values that high). THey also benefit from larger periodicity.

Quote:

>> moduloing rand() or multiply+clipping rand()/(RAND_MAX+1) does nor prefer one over the other
You're really confusing me, as the math is quite simple - (M / (RAND_MAX + 1.0)) - divide M by RAND_MAX+1 "buckets" and store the result into a double. So we're just at the mercy of double's precision and any IEEE error that's propagated with further math on the result. Still in my mind, one is obviously better than the other - one method preserves rand()'s distribution - the other does not.

No, you are NOT at the mercy of double's precision, double has MUCH MUCH more precision than the RAND_MAX different values.

This way of doing things with a detour to a double does not make a BETTER resulting distribution than directly moduloing rand() or fixed-point-integermath multiplying rand() and clipping.
The only real difference is that modulo extracts the low bits, and the other 2 extract the high bits. If it were a perfect distribution, either is equally good, and you can adjust the formula for all 3 to achieve the opposite effect if you so desired.

Quote:

>> (assuming rand() would be perfectly distributed without artefacts, which it isn't).
Artifacts are besides the point. We have a range and distribution described as uniform. What math is required to preserve that distribution? What math would destroy that distribution? If your formula doesn't include RAND_MAX, then you're not working with the distribution, you're working against it

As described above.
Changing a UD over 0-X into a UD over 0-Y where Y<X and all outcomes are integers. can ONLY
break the UD if X%Y != 0
preserve the UD is X%Y==0
Again, elementary math at play here.

The formula doesn't NEED to have RAND_MAX in it. If RAND_MAX is a power of 2-1. If so, you can also work with or infer the amount of bits involved (which is what the fixed point multiplication+shift does). With modulo you don't even need anything, you just extract as many of the low bits as are adequate. How many bits there really are aren't of any concern (assuming there's enough of them of course).
Again, whatever method, ending up with a UD assumes R%M=0 if this isn't the case, either method introduces the same inacuracy, just over a different set of numbers.

Note: You CAN turn PRNG UD over 0-X into a UD over 0-Y with integers by managing an errorfactor, but what this really does deep down is changing the nature of the PRNG and making it a PRNG over 0-X' with X' > X and X'%Y ==0

Quote:

>> your rand_byte (by extracting high bits through multiplication) is not technically better than using a %256.
Still confused. You've implied more than once now that I'm somehow not using all the bits. How does floating-point multiplication/division not use all the bits?

it uses all the bits in the double.
but you throw away a chunk of bits when you convert the double to a int. This isn't technically any different from what you do with modulo.

Quote:

>> It wouldn't surprise me at all btw, that ...
I'm not surprised, given it's a round trip through two ranges R where R=(2^N)-1.

What I want to work out, is if R isn't a nice power of 2 minus 1 - then would the uniform_deviate() method still preserve the distribution. I say "yes", to the extent that double's representation allows.

your approach with a double is (one of) the correct methods to use if RAND_MAX is not a power of 2 minus 1.
then again. So is using modulo
and so is using fixed point math, the fixed point becomes less efficient of course since the shift turns into a divide.

Code:

int rng(int max) { return (int)rand() * max % (RAND_MAX+1); }

It's obvious enough to work out that when RAND_MAX IS a powerof 2 minus 1 that the % can be turned into a shift.

Yes this preserves distribution. Again, if max is a factor of RAND_MAX+1, if it isn't then the result isn't a UD.
March 1st, 2013, 03:16 PM
Codeplug

Re: Generating big random numbers in C

>> non-technically speaking ...
Thanks for the English version. The thought experiment of reducing [1,3] to [1,2] (or [0,5] to [0,3]) is what I'm used to seeing when explaining why a mathematically perfect UD can not be preserved. What I was hoping to extract is that the subjective distribution can still be maintained despite that - when using the proper method.

>>> Is there some other mapping other than 300 that will show the breakdown in uniformity?
>> Suppose I was interested in a range of 0-32700... but 68 values will appear twice as much as the other 32700.
Confirmed via code - 68 outliers occurred twice as much. Thanks for giving me a number that "shows the breakdown".

Long story short, only nrand() is portable and maintains the distribution "perfectly" - by some definition (I'm quoting usenet).

Code:

// return a random integer in the range [0, n) int nrand(int n) { if (n <= 0 || n > RAND_MAX) return -1; // C++: throw domain_error("Argument to nrand is out of range"); const int bucket_size = RAND_MAX / n; int r; do r = rand() / bucket_size; while (r >= n); return r; }

gg
March 2nd, 2013, 03:41 AM
superbonzo

Re: Generating big random numbers in C

Quote:

Originally Posted by Codeplug

What I was hoping to extract is that the subjective distribution can still be maintained despite that - when using the proper method.

Note that the finiteness of the range is what forbids the preservation of uniformity and what makes your double based solution non UD, in general. For example, your code above is essentially equivalent to rejecting all rand() results not in the target range. Hence, in probability theory, your function "nrand" would be modeled by a map [0,RAND_MAX]^infinity->[0,n), where the map domain is made of infinitely many products of the source range ( technically, a direct limit of probability spaces ).
March 2nd, 2013, 05:01 PM
Codeplug

Re: Generating big random numbers in C

Full disclosure - nrand() is from "Accelerated C++".
http://www.amazon.com/Accelerated-C-.../dp/020170353X
https://groups.google.com/forum/#!ms...A/WevBqpEv7QEJ

gg
March 4th, 2013, 09:46 AM
OReubens

Re: Generating big random numbers in C

Quote:

Originally Posted by Codeplug;2107741Long story short, only nrand() is portable [I

and[/I] maintains the distribution "perfectly" - by some definition (I'm quoting usenet).

It's one method of reducing a UD...

This preserves the uniform distribution. It doesn't preserve all the random behaviour. And it definately doesn't preserve periodicity (it throws away some of it).

This approach is simple, but it's in a loop and could have a rather nasty worst case. Suppose I wanted numbers from 1-16400, bucket size would be just 1. And it be needing to throw away about half of the generated numbers.

This could have bad consequences for performance (nrand() has unpredictable duration).
It also has 2 divides, which can also be really bad if you have random numbers with more bits than your cpu can handle and need a software "very large integer" class.

It has the benefit of being "simplistic" though. Which isn't necessarily all bad :)

There are other approaches to achieve this with advantages and disadvantages of their own.

The thing to realise here...
If your PRNG has enough bits, then the "+1 on some numbers" will be hidden in the statistical "noise". More often than not, this is a prefered way of dealing with it. Also know that C's rand() is horrible for just about anything.
March 4th, 2013, 04:13 PM
Codeplug

Re: Generating big random numbers in C

According to the author of nrand -

Quote:

There is no fixed, finite bound on the worst-case execution time of this program, but it terminates with probability 1 and its average performance is O(1). I strongly doubt it is possible to do any better.

gg