[RESOLVED] efficiency issues

**stratovarios** · September 17th, 2015, 11:10 PM

Hello guys,
This program should generate set amount of numbers (top) and store them in array (numbers) without repeats.
This is a training program for a class I'm taking. the program runs and does what it suppose to, however i noticed that I'm unable to run the program for top=40000. I think my problem is the second for loop its running through too many checks/compares. The program ran fine for top = 30000. I'm just trying to find out if there is a better way to run the loop where it would do less compares. I'm hoping if the loop runs less checks it will be able to run for bigger arrays. Full code below:

Code:

#include <iostream>
#include <iomanip>
#include <cstdlib>

using namespace std;

int main()
{

	const int top=4000;
	int numbers[top];
	int i,j;
	srand(time(NULL));

	for (int i=0;i<top;i++)
{
    bool check;
    int n; 
    do
    {
    n=rand()%top;
    check=true;
    for (int j=0;j<i;j++)
	{
        if (n == numbers[j]) 
        {
            check=false; 
            break; 
        }
	}
    } while (!check); 
    numbers[i]=n; 
}


	int lines;
		
	if (top%10==0)lines=top/10;
	else lines=top/10+1;
	for (j=0; j<lines;j++) {
		for (i=0;(i<10) && (10*j+i<top);i++) {
			cout << setw(7)<< numbers[10*j+i];			
		}
		cout << endl;
}
	
	system ("PAUSE");

	return 0;
}

**tiliavirga** · September 18th, 2015, 01:08 AM

Originally Posted by stratovarios

This program should generate set amount of numbers (top) and store them in array (numbers) without repeats.

It looks like you're trying to generate a sequence of ints in random order.

Your solution strategy can only be used efficiently when the int range is much larger than the number of ints. But in this case they're equal (you are generating top ints in the 0 to top-1 range) and that's a problem. It's because the more ints you have already generated the less the probability becomes that the next int will be unique so the inner-most loop will fail excessively when the array starts getting full (when the last number is to be generated the probability of a unique int is just 1/tops so on average it will fail tops times until if finds the missing int).

But also when the int range is much larger than the number of ints the algorithm is still be roughly quadratic (due to the nested loops). Fortunately there is a linear algorithm available called random shuffle. You simply fill the array with the ints you want in any order you like and the shuffling algorithm will re-order them randomly in just one pass through the array. Random shuffle is quite easy to implement yourself but it's also available as a standard C++ algorithm,

http://www.cplusplus.com/reference/a...andom_shuffle/

**stratovarios** · September 18th, 2015, 09:13 AM

Thank you tiliavirga, this is what i managed to do with the shuffle function. which seems to work a lot faster than my original program. The only issue that i might have is that i might be required to generate the numbers randomly then assign them vs assign numbers in order then shuffle them. is there some function that can be customized to generate random numbers up to 100000 or something like that instead of shuffling?

Code:

#include <iostream>
#include <iomanip>
#include <algorithm>
#include <cstdlib>

using namespace std;

int myrandom (int i) { return std::rand()%i;}

int main()
{

	const int top=60000;
	int numbers[top];
	int i,j; // counters
	srand(time(NULL));


	for (i=0;i<top;i++) {
		numbers[i]=i+1;
	}
	
	random_shuffle(&numbers[0],&numbers[top], myrandom);

	int lines;
		
	if (top%10==0)lines=top/10;
	else lines=top/10+1;
	for (j=0; j<lines;j++) {
		for (i=0;(i<10) && (10*j+i<top);i++) {
			cout << setw(7)<< numbers[10*j+i];			
		}
		cout << endl;
	}


	system ("PAUSE");

	return 0;
}

**GCDEF** · September 18th, 2015, 10:39 AM

Originally Posted by stratovarios

is there some function that can be customized to generate random numbers up to 100000 or something like that instead of shuffling?

I don't know of one that won't generate duplicates. Easiest thing is probably fill an array with the numbers 1 through 100000 or whatever you need, random shuffle them, then select however many you want from the randomized array.

**stratovarios** · September 18th, 2015, 11:24 AM

Sounds good thank you guys i will just stick to the last program i posted.

**tiliavirga** · September 19th, 2015, 09:42 AM

Originally Posted by stratovarios

The only issue that i might have is that i might be required to generate the numbers randomly then assign them vs assign numbers in order then shuffle them. is there some function that can be customized to generate random numbers up to 100000 or something like that instead of shuffling?

I suggested the random shuffle algorithm because I knew it would be the fastest in this case (re-ordering a full range) but you still can improve your original algorithm somewhat. There you scan linerarly through the numbers array to find out whether a certain int has appeared before. A much more efficient way is to use a separate std::unordered_set (a hash table) for that purpose. To enter an int into an std::unordered_set is a very fast constant operation and there will be info available as to whether the int was already present (and thus has appeared before). Also since std::unordered_set holds unique values only (no doubles) it can be used as a termination criterion. When the number of stored ints equals top every int in the range is accounted for and it's time to quit.

Note that in your new code you are re-ordering the [1 .. top] sequence and not [0 .. top-1] like before!

The myrandom() function is not necessary for random_shuffle() to work and since you're using std::rand it's better to leave it out since the default most likely will be better. If you still want to make the use of a random generator visible in your solution I suggest you replace the standard random_shuffle() with an explicit implementation like say,

Code:

	for (int t=top; t>0; --t) { // my random shuffle
		std::swap(numbers[myrandom(t)], numbers[t-1]);
	}

**OReubens** · September 21st, 2015, 10:53 AM

standard rand() is a pretty horrible PRNG, that only has a range of 32K (and a period of 32K)

So no matter how you twist and turn this, it can only generate 32K unique numbers at most. You can "try" to combine multiple random numbers into a bigger number but this won't work very well because.
- you'll get an even shorter period.
- 2 (or more) consecutive random numbers are interdependant, so you're really not making this "more random" at all, and unless you REALLY know what you're doing, still won't give you more than 32K unique numbers. In fact, you're very likely to make it 16K (or less) unique numbers.

You want to be using one of the larger period/range PRNG's. Not from the C-library, but from the STL library (#include <random>), or you could write a simple rand() function of your own that works with 32bit (or 64bit) values rather than an short.

for the C++ approach here's one (using the standard random engine)

Code:

std::default_random_engine generator;
std::uniform_int_distribution<int> mydistribution(1,100);
int number = distribution(generator);  // generates number in the range 1..100

depending on need, you may also need to seed the generator with a random seed (based on time is a typical approach)

But as GCDEF pointed out, if what you really want is a "drawing" of numbers out of a pool of numbers (akin to taking cards out of a deck), then a shuffle is a better way to achieve this.
random number generaters can (and are supposed to) return duplicates within the requested range.

**stratovarios** · September 21st, 2015, 04:20 PM

its actually funny you say that. i ended up with the same idea and almost the same code

. i attached my final program which from my testing seems to be able to do exactly what i wanted to do, and can actually seemed to handle up to 200000 values in an array.

Code:

#include <iostream>
#include <iomanip>
#include <cstdlib>

using namespace std;

void swap (int *a, int *b)
{
    int temp = *a;
    *a = *b;
    *b = temp;
}

int main()
{

	const int top=60000;
	int numbers[top];
	int i,j,compares = 0;
	srand(time(NULL));

	for (i=0;i<top;i++) {
		compares++;
		numbers[i]=i+1;
	} compares++;
	

	for (int i = top-1; i > 0; i--)
    {
		compares++;
        int j = rand() % (i+1);
        swap(&numbers[i], &numbers[j]);
    }
	compares++;

	int lines;
		
	if (top%10==0)lines=top/10;
	else lines=top/10+1;
	compares++;
	for (j=0; j<lines;j++) {
		compares++;
		for (i=0;(i<10) && (10*j+i<top);i++) {
			compares++;compares ++;
			cout << setw(7)<< numbers[10*j+i];			
		}
		compares++;compares++;
		cout << endl;
	}
	compares++;

	cout <<endl<< "compares "<< compares<<endl;


	
	system ("PAUSE");

	return 0;
}

**stratovarios** · September 21st, 2015, 04:28 PM

I noticed a bottleneck at around 30k i was trying to reach 60k at the start which made me change my whole approach to the problem. I was under the impression that i had to generate random numbers to fill the array randomly, however that wasn't the case all i needed is to fill the array with 'top' amount of numbers where the numbers domain is [o, top]. avoiding the whole idea of generating randomized numbers just made it much easier. I will keep a note of your approach as i might need something like that in the future.

**superbonzo** · September 22nd, 2015, 02:08 AM

please, note that the three approaches you explored ( fill an array of random ints, randomly shuffle an array of ordered ints, randomly transpose an array of ordered ints ) are very far from being equivalent from a statistical pov. Running time aside, they're not exchangeable, unless your goal is merely to achieve a "random-looking" non repeating sequence ( in which case easier and faster alternatives probably exist ).

BTW, note that you're storing your <numbers> array in the stack, beware this will result in a stack overflow if <top> is set too high; you'd better use a dynamically sized array instead in this case ( like std::vector ).
Also, keep in mind OReuben's comment on std::rand; as you're using it, it can give badly biased results ...

**OReubens** · September 22nd, 2015, 07:52 AM

Rough guess from looking at your code and using rand() (which can't ever return anything > 32K)

1) only the items with low indexes will be properly shuffled, items with high indexes won't.
2) you probably have a significant amount of items that never moved at all.
3) you likely have one end of the array having mostly large numbers and the other half mostly small numbers. so it's not a statistically valid shuffling technique.
If you want a shuffle, there's decent and fast shuffling routines in the STL. See here http://en.cppreference.com/w/cpp/alg...random_shuffle

**Philip Nicoletti** · September 22nd, 2015, 09:29 AM

Originally Posted by OReubens

Rough guess from looking at your code and using rand() (which can't ever return anything > 32K)

Just a note: that is implementation dependent. Almost all systems (except the one the OP is probably using) have RAND_MAX
of at least 2^32.

The rand() that Visual C++ uses does have a 32K value for RAND_MAX (but a much longer period).

**stratovarios** · September 22nd, 2015, 05:10 PM

If you want a shuffle, there's decent and fast shuffling routines in the STL. See here http://en.cppreference.com/w/cpp/alg...random_shuffle

i did end up using the method you are suggesting. i posted a second program in my second post that had a completely different set up to shuffle the numbers. actually if you look at post #8 you will see the final program i ended up using.

Thread: [RESOLVED] efficiency issues

Thread Tools

Display

[RESOLVED] efficiency issues

Re: efficiency issues

Re: efficiency issues

Re: efficiency issues

Re: efficiency issues

Re: efficiency issues

Re: [RESOLVED] efficiency issues

Re: efficiency issues

Re: [RESOLVED] efficiency issues

Re: [RESOLVED] efficiency issues

Re: [RESOLVED] efficiency issues

Re: [RESOLVED] efficiency issues

Re: [RESOLVED] efficiency issues

Posting Permissions