Maximum size of an array

**manojg** · October 29th, 2005, 08:54 AM

Hi,

I have a problem and I searched codeguru forum but I could not find the answer.

I am trying to generate a non overlapped uniform random number between 1 and 2*23999976. Below is my code.

#include<iostream>
#include<fstream>
using namespace std;

main()
{
long int n = 2*23999976;
long int j[n];

ofstream out("rand.dat");

for(long int i = 0; i < n; i++){
found:
j[i] = 1 + (long int) (2.0*23999976.0*rand()/(RAND_MAX+1.0));
for(long int k = 0; k < i; k++){
if(j[k] == j[i])goto found;}

out << j[i] << endl;}

out.close();
return 0;
}

It compiles fine but when I run it, it says Segmentation fault. If I use small number instead of 2*23999976 then it works fine. So, I guess it is due to limit on the size of the array that I have used. what is the max. size of an 1-d array? Is there any other way to do this problem?

Thank you very much.

**SuperKoko** · October 29th, 2005, 09:36 AM

The array you defined (a VLA) contains 2*23999976*4 bytes = 184 MB.
The stack is not big enough to contain that.
You can allocate that on the heap, or choose a greater stack in the .DEF file.
However, your program will probably needs much time to make its work!

**manojg** · October 29th, 2005, 10:13 AM

Hi SuperKoko,

Thank you very much for you reply. Could you please give an example how to do this.

**wildfrog** · October 29th, 2005, 10:30 AM

Instead of

Code:

long int n = 2*23999976;
long int j[n];

do

Code:

long int n = 2*23999976;
long int *j = new long int[n];

// clean up when done
delete[] j;

- petter

**manojg** · October 29th, 2005, 10:41 AM

Thanks a lot guys.

**SuperKoko** · October 29th, 2005, 12:18 PM

You can also use a std::vector:
Just replace

Code:

long int j[n];

With:

Code:

std::vector<long int> j(n);

A few things:

if RAND_MAX is not sufficient (2.0*23999976.0*rand()/(RAND_MAX+1.0)) will not return a random number between 0 and 2.0*23999976.0, but only RAND_MAX different numbers.
So, if RAND_MAX is 32767 (as in BC++ implementation), your program will perform an infinite loop.
Your algorithm is extermely slow.
To choose the last integer, it will need about 2*23999976 tests. And each test will need 2*23999975 operations. 2,303,995,296,002,400 operations! If the computer is able to perform 1000,000,000 operations each second, it will need about 2,303,995 seconds=26 days.
The previous integer, would need about 13 days to be found.

A much faster algorithm is possible:
You start by creating a big table containing all ordered integers between 0 and 2*23999976.
And than you shuffle the table.
You just need to swap each element at table[i] with a random element at another place in the table.
Choose better variable names. one-letter is generally not sufficient.
For example, j is generally not used for an array, but for an insignificant index.
Don't use magic numbers (numbers appearing directly in the code, and whose value is arbitraty).
Instead, it is preferable to define constants.
For example
Code:
```
const std::size_t RandomArraySize=2*23999976L;
```

**mehdi62b** · October 29th, 2005, 01:50 PM

Originally Posted by Superkoko

Your algorithm is extermely slow.
To choose the last integer, it will need about 2*23999976 tests. And each test will need 2*23999975 operations. 2,303,995,296,002,400 operations! If the computer is able to perform 1000,000,000 operations each second, it will need about 2,303,995 seconds=26 days.

well,if we want to compute it more exactly,
Z=zigma
in its worst case it would take : z[n(n-1)](from 1 to 2*23999976)

z[n(n-1)] =z[n^2-n] = z[n^2]-z[n] = n(n+1)(2n+1)/6 - n(n+1)/2
if n = 2*23999976 it would be 368627731252483959720 seconds which is 11689 years for a 1000 Mhz cpu !!

**olivthill** · October 29th, 2005, 04:36 PM

Oh, a goto statement in a C program?!.

**SuperKoko** · October 29th, 2005, 04:44 PM

Originally Posted by mehdi62b

well,if we want to compute it more exactly,
Z=zigma
in its worst case it would take : z[n(n-1)](from 1 to 2*23999976)

z[n(n-1)] =z[n^2-n] = z[n^2]-z[n] = n(n+1)(2n+1)/6 - n(n+1)/2
if n = 2*23999976 it would be 368627731252483959720 seconds which is 11689 years for a 1000 Mhz cpu !!

Actually, if we don't assume anything about the pseudo-random number generators it may take an infinite time.
It is comparable to that problem : choosing binary random numbers (0 or 1), and wait until the returned number is zero.
It may return after 1000 loops (with a probability equal to 1/2^1000).
It may also never return zero (with a probability=0, but the event still exists), and in that case the program is indefinitely blocked.
However, since that is only a pseudo-random generator, it is possible that all numbers are generated, and in that case, an infinite loop is impossible.
It is also possible that an infinite loop has a non-null probability!

**mehdi62b** · October 29th, 2005, 06:12 PM

However, since that is only a pseudo-random generator, it is possible that all numbers are generated, and in that case, an infinite loop is impossible

yes,I assumed for generating nth item,after calling Rand function for n times the probabilty would be 1.

**Synthetic Being** · October 30th, 2005, 04:54 PM

For problems related to printing a range of numbers in random order, random_shuffle is usually the tool of choice. In this case however, the large amount of numbers themselves take a lot of memory. Fortunately, one doesn't have to store the numbers.

An alternative solution would be to store one bit of information about every number (whether it has been already printed or not). The next value will always be chosen randomly from the count of remaining values. Finding out the number corresponding to a value is rather slow, if almost the whole array must be read every time in order to count the preceding unprinted numbers. This can be sped up by making seek blocks, which contain the count of unprinted numbers in a specific section.

The end result seems to be rather fast and only takes a reasonable amont of memory.

Code:

#include <cstdlib>
#include <ctime>
#include <vector>
#include <fstream>

// Print out a range of integer values in random order
int main() {
	int target_count = 2*23999976; // number of values
	int target_offset = 1; // starting from value
	
	// divide source range into blocks with 2^n items in each
	// where 2^n is approx. sqrt(target_count)
	// each block is initialised with the number of unprinted values
	int seek_block = 0;
	for(int b = 1; b < target_count; b <<= 1) ++seek_block;
	seek_block >>= 1;
	std::vector<unsigned short> seek_index(target_count / (1 << seek_block) + 1, (1 << seek_block));

	// main table: might require one byte per item or
	// just one bit, depending on implementation
	std::vector<bool> numbers_sent(target_count, false);

	// initialise rand() with seed value
	srand((unsigned)time(NULL));

	std::ofstream out("rand.dat");

	for(int i = 0; i < target_count; ++i) {
		int random_source = rand();
		if(RAND_MAX + 1 < target_count - i) {
			// combine two random values if the range is too large
			random_source = (random_source * (RAND_MAX + 1)) + rand();
		}
		// the range of produced values is reduced after every value
		int random_target = random_source % (target_count - i);
		int j = 0, k = 0;
		// seek to approximate location of Nth unprinted value
		for(; k <= random_target; k += seek_index[j], ++j);
		--j;
		k -= seek_index[j];
		j <<= seek_block;
		// find out the exact location of Nth unprinted value
		for(; k <= random_target; k += !numbers_sent[j], ++j);
		--j;
		// mark value as printed
		numbers_sent[j] = true;
		--seek_index[j >> seek_block];

		out << j + target_offset << "\n";
	}
	out.close();
	return 0;
}

**Paul McKenzie** · October 30th, 2005, 07:01 PM

Code:

std::vector<bool> numbers_sent(target_count, false);

Use bitset instead of vector<bool>. This saves a lot more space:

Code:

#include <bitset>
//...
std::bitset<2*23999976> numbers_sent;

Now you are dealing with actual bits, making the number of bytes allocated around 5,999,994.

Regards,

Paul McKenzie

**Synthetic Being** · October 31st, 2005, 01:39 AM

Originally Posted by Paul McKenzie

Use bitset instead of vector<bool>. This saves a lot more space

Now you are dealing with actual bits, making the number of bytes allocated around 5,999,994.

vector<bool>
bitset

In theory, there doesn't seem to be much of a difference, except that vector is more common and its syntax can often be recited from memory. Also, it doesn't require a constant number of entries. In practice though, not all vector<bool> implementations are memory optimised.

**NMTop40** · October 31st, 2005, 05:44 AM

vector<bool> is a specialisation but should generally be avoided.

You could use a std::set and add each number as it's generated. You would have the advantage of saving space but would have an O(log N) lookup as you generated more numbers. You'd also have to generate another random number whenever you hit a previously generated one.

Another alternative is to use std::map< int, int > (or unsigned ints).
Whenever you generate a number you do this:
- Look up in the map. If it exists then your new entry is the "second" value.
- If it does not exist then it is the first value.

You must keep a count of the remaining "size", i.e. you count down.
Each time you generate a number you do the following:
- Reduce your size by 1.
- Move your last element into the space of the one you last picked. (If your last element is mapped, move its contents into the cell you picked and delete the previous entry from the map). If the last space you picked was mapped, overwrite the old value with the new one.

This would probably be the most efficient method if you are picking a medium-sized number of elements and you want to be more efficient in memory space (i.e. not cache a large table).

**Synthetic Being** · October 31st, 2005, 08:11 AM

Originally Posted by NMTop40

vector<bool> is a specialisation but should generally be avoided.

"Vector<bool> Considered Harmful"? Are there any other problems than the obvious performance/efficiency-related?

...
This would probably be the most efficient method if you are picking a medium-sized number of elements and you want to be more efficient in memory space (i.e. not cache a large table).

Perhaps I didn't understand your cunning plan, but how does this reduce the peak memory consumption, especially after map/set container overhead? If we decide there's enough memory to keep every value in memory at some point, then we might as well go down the random_shuffle route, which would be O(n).

Thread: Maximum size of an array

Thread Tools

Display

Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Re: Maximum size of an array

Posting Permissions