[RESOLVED] set, sort and qsort

**grahamr (work)** · March 9th, 2007, 12:14 PM

I have some code. It comes in two flavours, C and C++. The C flavour used qsort. The C++ flavour used set, so as the items were inserted they were sorted into the correct position.

This is the longest running part of the code, so it is vital that this bit runs fast. I would prefer to use C++ as the C code looks a bit clunky and old. However the C version runs a lot faster than the C++ one. Only timing the complete algorithm, not the individual parts of it, the C code achieves a total of 7.41 seconds. The set function is beaten with a respectable score of 10.65.

The functions as the predicate that both call are very similar (slight difference in return as one is binary and the other is an int), so there is unlikely to be that much difference in the two. The difference has to be that the set is sorting as it goes and growing the set if it gets too large (set doesn't have a reserve() function).

So I decide to swap the set for a vector that reserve()s the correct amount of space, adds all the elements and then sorts - exactly what the C version with it's POD does.

So I ran the code expecting similar results to the qsort code but drastically different results - the std::sort was slower coming in at 12.66 seconds. I reran it and got very similar results. I checked and double checked, I took out the reserve and used push_back (slower still), switch iterators for simple variables but nothing made it faster.

Eventually I created another function, this time making it as close to the original C code but with just the std::sort being used. Again this came in at 12.64 seconds.

You may say that without seeing the code you can't say what is the problem, but data is a text file, being operated on in chunks to compress it, and 4 functions alone takes nearly 300 lines, the full code is over 1000 lines. The data coming in to all 4 functions are vectors of chars, and all 4 functions output exactly the same results (both in output data and time).

The predicates compare a range of data to see if they are the same. In two of the functions pointers are passed and in the other two the actual offsets are used. They are functionally the same:

Code:

// C qsort
inline int
range_compare( const unsigned int *i1,
				const unsigned int *i2 )
{
	unsigned int l1 = (unsigned int) ( length - *i1 );
	unsigned int l2 = (unsigned int) ( length - *i2 );
	int result = memcmp( buffer + *i1, buffer + *i2,
							 l1 < l2 ? l1 : l2 );
	if ( result == 0 )
		return l2 - l1;
	else
		return result;
};

// C++ std::sort
inline bool RangeCompare( const unsigned char* p1,
				  const unsigned char* p2 ) 
{
	unsigned int l1 = (unsigned int) (( buffer - p1 ) + length );
	unsigned int l2 = (unsigned int) (( buffer - p2 ) + length );
	int result = memcmp( p1, p2, min<unsigned int>( l1, l2 ) );
	if ( result < 0 )
		return 1;
	if ( result > 0 )
		return 0;
	return l1 > l2;
}

Now it is clear to me that the sensible thing would be to go with the c qsort routine. However I am shocked at the std::sort routine being slower than the . Not being the sort to give up easily I even passed qsort the vector directly which also worked fine and gave similar results to the original.

I know that I am using an old version of VC++ (VC6) but the sort routine has guaranteed complexity. If I run the routine on binary or text, of different sizes (from 17kb up to 66mb) I get a similar result of set beating sort and qsort beating them all.

I'm pretty sure my 6 different files aren't all pathological cases and if I change the size of the blocks they sort the ordering of speed stays the same.

Oh, well...something to think about over the weekend.

**Paul McKenzie** · March 9th, 2007, 12:47 PM

1) The std::sort returns a bool. A bool is either true or false, not 1 and 0 as your code is doing when it returns from the predicate function. Please change this. Returning 1 and 0 will make the compiler have to do an extra conversion to whatever is implemented as true or false.

2) Please understand that std::sort requires a strict weak ordering. This means that elements that are equal return false, and that a < b must be the case throughout the sorting operation, i.e. you can't have a case where b < a returns true if you have already specified during the sort at some time that a < b return true. I don't know if your return values guarantee this.

3) Did you time the release version of your code, and not the debug version?

4) Usually, persons would give us the data and test whether std::sort is actually slower than qsort or not. My bet is that it isn't slower, just that you're not doing something optimally or correctly.

Regards,

Paul McKenzie

**Yves M** · March 9th, 2007, 01:05 PM

1) How big is the data set being sorted? For very small sets, qsort does have an advantage since std::sort tends to have a bigger initial overhead
2) std::set being faster than std::sort could mean that copying objects is non-trivial. Is this the case?

**Philip Nicoletti** · March 9th, 2007, 01:13 PM

I admit to being very confused.

1) you say you have a vector<char> , but your predicate is:

Code:

inline bool RangeCompare( const unsigned char* p1,
		const unsigned char* p2 )

That should not even compile

2) you give the following for your predicate to qsort:

Code:

inline int
range_compare( const unsigned int *i1,
	const unsigned int *i2 )

But the signature for qsort() needs const void* arguments. Again, this
should not even compile.

3) Could you provide the variable declarations and the code used
to call qsort() and std::sort ??

4) I'll have to check to make sure, but I think that Scott Meyers has
written the function objects have a better chance at being inlined
than normal functions. So you might try making the std::sort
predicate a function object instead. (But truthfully, I don't think it
will make much difference).

**MikeAThon** · March 9th, 2007, 03:00 PM

Slightly off-topic, but how much data are you trying to sort? In my experience, a C-style qsort that takes 7.41 seconds is remarkably slow, unless you are trying to sort a remarkably large amount of data (i.e., tens of millions).

As one example, I have sorted character strings of around 30 characters length each, where one million such strings are sorted in around 1/2 a second on a very-ordinary machine.

Mike

**JohnyDog** · March 9th, 2007, 04:54 PM

It is also matter of the data itself - quicksort is very slow for unfavorable data sets (worst case scenario O(n^2) ). std::sort tends to use introsort (is this guaranteed or just in my implementation of STL?) which is quicksort initially but switches to heapsort on too deep recursion, yielding worst-case complexity O(n log n).

**MikeAThon** · March 9th, 2007, 05:04 PM

Yes, with respect to data-dependency, most implementations of quicksort (including the CRT's implementation) are miserably slow when sorting on one very-commonly encountered type of data: repeating (i.e., identical) data, such as gender, state-of-residence, country code, etc.

Quicksort works best when the sorted-on data is all different, i.e., unique data, like social-security number.

If you are sorting on repetitious data (like male/female), it's usually best to concatenate a piece of unique data to the sorted-on data. The simplest thing to do is to concatenate the current index of the display order, and then sort.

Mike

**grahamr (work)** · March 12th, 2007, 04:59 AM

Originally Posted by Paul McKenzie

1) The std::sort returns a bool. A bool is either true or false, not 1 and 0 as your code is doing when it returns from the predicate function. Please change this. Returning 1 and 0 will make the compiler have to do an extra conversion to whatever is implemented as true or false.

Okay. Changed it. Maybe shaved a tenth of a second off the std::set and about double that with the std::sort. This is informal testing using QueryPerformanceCounter.

Originally Posted by Paul McKenzie

2) Please understand that std::sort requires a strict weak ordering. This means that elements that are equal return false, and that a < b must be the case throughout the sorting operation, i.e. you can't have a case where b < a returns true if you have already specified during the sort at some time that a < b return true. I don't know if your return values guarantee this.

I'll check into this.

Originally Posted by Paul McKenzie

3) Did you time the release version of your code, and not the debug version?

Release. The debug version takes so long to run on the larger files it is ridiculous but there are a lot of asserts in there to check that everything is hunky-dory.

Originally Posted by Paul McKenzie

4) Usually, persons would give us the data and test whether std::sort is actually slower than qsort or not. My bet is that it isn't slower, just that you're not doing something optimally or correctly.

I can't really post the entire code: There's over a thousand lines (some of which admittedly are now defunct due to being rewritten and 4 functions are the various sorting methods I am testing.

The sort routine is part of the Burrows Wheeler Transform. The basic operation is this: Take a block - I am using 16kb chunks but you can specify on compress what to use via a template parameter (decompress uses the value that is written to the compressed data). You take each block and rotate it by one character to the left. Then when that is complete you sort them into order. This is a reversible transform (it takes a few attempts to understand) but doesn't actually compress the data - just makes it more susceptible to compression. For speed & space you just compare pointers.

Here are the various parts (comments removed for brevity):

Code:

//std::set
			int i;
			std::set< unsigned char *, RangedCompare > p;
			for ( i = 0 ; i <= length ; i++ ) {
				p.insert(buffer + i);
			}
//std::sort (similar to set)
			int i;
			std::vector<unsigned char *> p(l);
			std::vector<unsigned char *>::iterator pp;
			for ( i = 0, pp = p.begin(); i <= length ; ++i, ++pp ) {
				*pp = buffer + i;
			}
			std::sort(p.begin(), p.end(), RangeCompare);

//std::sort (similar to qsort)
			std::vector<unsigned int> indices(l);
			std::vector<unsigned int>::iterator ii = indices.begin();
			int i;
			for ( i = 0 ; i <= length ; i++ ) {
				*ii = i;
				++ii;
			}	
			std::sort(indices.begin(), indices.end(), range_compare2);

// qsort
			int i;
			for ( i = 0 ; i <= length ; i++ )
				indices[ i ] = i;
			qsort( indices,
				   (int)( length + 1 ),
				   sizeof( int ),
				   ( int (*)(const void *, const void *) ) range_compare );

Thanks, Graham Reeds.

**grahamr (work)** · March 12th, 2007, 05:03 AM

Originally Posted by Yves M

1) How big is the data set being sorted? For very small sets, qsort does have an advantage since std::sort tends to have a bigger initial overhead

The data are 16kb blocks, being sorted as discussed above. The times I am quoting are for a 3.5mb XML-esque file (it isn't actually xml but looks similar).

Originally Posted by Yves M

2) std::set being faster than std::sort could mean that copying objects is non-trivial. Is this the case?

The objects being copied are just pointers into the data. The actual comparison is non-trivial (comparing long ranges of data).

**grahamr (work)** · March 12th, 2007, 05:11 AM

Originally Posted by Philip Nicoletti

I admit to being very confused.

1) you say you have a vector<char> , but your predicate is:

Code:

inline bool RangeCompare( const unsigned char* p1,
		const unsigned char* p2 )

That should not even compile

I've provided the code calls above, and I assure you they do compile.

Originally Posted by Philip Nicoletti

2) you give the following for your predicate to qsort:

Code:

inline int
range_compare( const unsigned int *i1,
	const unsigned int *i2 )

But the signature for qsort() needs const void* arguments. Again, this
should not even compile.

It does. The code snippet I gave in response to Paul should show how.

Originally Posted by Philip Nicoletti

3) Could you provide the variable declarations and the code used
to call qsort() and std::sort ??

The code snippet I gave in response to Paul should give that.

Originally Posted by Philip Nicoletti

4) I'll have to check to make sure, but I think that Scott Meyers has
written the function objects have a better chance at being inlined
than normal functions. So you might try making the std::sort
predicate a function object instead. (But truthfully, I don't think it
will make much difference).

By function object you mean a class using operator()() as the sole function? That's what I am using for the std::set but I couldn't get std::sort to work, so I made it a normal function.

Thanks, Graham

**grahamr (work)** · March 12th, 2007, 05:17 AM

Originally Posted by MikeAThon

Slightly off-topic, but how much data are you trying to sort? In my experience, a C-style qsort that takes 7.41 seconds is remarkably slow, unless you are trying to sort a remarkably large amount of data (i.e., tens of millions).
Mike

I am sorting 3.5million characters in 16kb chunks as my time, but the range compare is across those 16kb or characters meaning a lot of time is in the comparison. However if the comparisons are equal, and I've implemented two different ways to compare std::sort to std::set and qsort and the std::sort is slower in both then the algorithms are at fault, which is what I am trying to find out. I will try and get on the build machine that is running VS8 to see if that levels out the time (however I am stuck to VS6 while the current version of our software runs to it's natural end).

**Philip Nicoletti** · March 12th, 2007, 06:53 AM

1) OK ... your original post said vector of chars, not vector of char* ...
that confused me a bit (I probably took that phrase too literally).

2) OK .... you are casting in the qsort call to the correct signature.

3) In this code ...

Code:

std::vector<unsigned int> indices(l);

a) What is "l" ? (I assume it equals length+1).
b) what does the code for range_compare2 look like ?

**grahamr (work)** · March 12th, 2007, 07:22 AM

Originally Posted by Philip Nicoletti

a) What is "l" ? (I assume it equals length+1).

You are correct. I forgot to mention what each variable means.

Originally Posted by Philip Nicoletti

b) what does the code for range_compare2 look like ?

Code:

inline bool
range_compare2( const unsigned int i1,
				const unsigned int i2 )
{
	unsigned int l1 = (unsigned int) ( length - i1 );
	unsigned int l2 = (unsigned int) ( length - i2 );
	int result = memcmp( buffer + i1, buffer + i2,
						min<unsigned int>( l1, l2 ) );
	if (result < 0)
		return true;
	else if (result > 0)
		return false;
	return l1 > l2;
};

**Yves M** · March 12th, 2007, 09:52 AM

Originally Posted by grahamr (work)

The objects being copied are just pointers into the data. The actual comparison is non-trivial (comparing long ranges of data).

Aha. This can be where the three-way comparison of qsort pays off. If there are a decent amount of equal objects then std::sort will be at a disadvantage. One interesting test would be to count the number of comparisons done in each case and the number of the different outcomes (i.e. smaller, equal, bigger). One thing you might try is std::stable_sort (which uses merge sort) but it can very well be that for this problem qsort is better than std::sort.

**grahamr (work)** · March 12th, 2007, 10:52 AM

Using the all new binary predicate (the details of which are below) I have managed to shave nearly a quarter of a second from the stl::sort(), though the same trick didn't work for the set() method (wonder why?)

The number of comparisons for a block size of 16,384 on a file of 3,548,708 bytes:
Per block (used first block for values):
Less Than: 101,212
Equal To: 1,684
Greater Than: 117,400

Per file:
Less Than: 37,834,878
Equal To: 720,030
Greater Than: 22,182,547

The per block values seem fairly representative. The EqualTo seemed to vary somewhat, sometimes being as low as 66 and as high as 4200, but generally fell between 1000-2000.

And finally: Wow. The stable_sort algorithm comes in at 5.38 (averaged over 5 runs, discarding the highest and lowest). I think at this point it would be foolish to try and eke more speed out of the routine. This is more than twice as fast as that I started off with. A little bit of spit and polish and the code should make the release.

The new RangeCompare predicate used for the std::sort algorithm:

Code:

	struct RangeCompare : public std::binary_function<const unsigned char*, const unsigned char*, bool>
	{
	public:
		bool operator()( const unsigned char* p1,
			const unsigned char* p2 ) const
		{
			unsigned int l1 = (unsigned int) (( buffer - p1 ) + length );
			unsigned int l2 = (unsigned int) (( buffer - p2 ) + length );
			int result = memcmp( p1, p2, min<unsigned int>( l1, l2 ) );
			if ( result < 0 ) {
				return true;
			}
			if ( result > 0 ) {
				return false;
			}
			return l1 > l2;
		}
	};

Thanks!

Thread: [RESOLVED] set, sort and qsort

Thread Tools

Display

[RESOLVED] set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Re: set, sort and qsort

Posting Permissions