Greetings to all. I am trying to make a quicksort that is faster than the qsort function from the library . So far, I've come to discover that if I choose the pivot at the middle of the sorting vector, my qsort beats the library qsort, but this depends on the size of the vector. At 100k elements, it's ok, but at 1kk elements, from 100 test cases, the library beast my version 100-0.
So the problem is at larger vectors, and in choosing the proper pivot. Can somebody please give me some hints? I would very much appreciate it.
I can provide the code I have so far if it's needed.
Greetings to all. I am trying to make a quicksort that is faster than the qsort function from the library.
To add to laserlights answer to you at the other link:
Color me skeptical, but if you're that confident you could code something better than a library implementor (probably PJ Plauger at DinkumWare), then I would think you wouldn't be asking questions here.
A lot of work goes into making the standard library routines as efficient as possible. These aren't high-school coders writing these libraries -- some of these authors are the cream of the crop C and C++ programmers. Unless you know all the ins and outs of sorting, algorithms, etc. then it is doubtful what you have will beat the library implementation for most use cases.
The performance of any quicksort implementation is highly dependent on the content of the data being sorted. You have discovered only one tip of this iceberg, as you have found that the selection of the pivot point from the middle makes a difference.
Try it with different data, and you will probably find that your implementation is exceedingly slow for some data. Some of your test cases should include:
1. Completely random data. This is probably the test case you are using now.
2. Already-sorted data, both ascending and descending.
3. Almost sorted data: as above except that a few items of data are aout-of-order (think about a user adding an arbitrary item to the end of a sorted list, and then requesting a re-sort)
4. Highly redundant data. Think about a M/F sort based on gender. Or a "by state" sort of addresses. There will only be a few different categories (there are only 50 states), and most quicksort implementations perform quite poorly on such data.
The performance of almost any quicksort algorithm can be improved simply by appending some random data to the sort item before the sort and removing it afterwards. For example, for a "by state" sort, append a random string of 5 character to the end of the 2-character state code, sort, and then delete the appended 5 characters. You would be amazed.
Last edited by MikeAThon; May 31st, 2011 at 02:23 PM.
Reason: Changed "pre-pend" to "append". "Pre-pend" is wrong
Thanks for the response.
I have tried to do a median of three version, but didn't got quite what I was looking for.
What I have in mind is that for the size of the sorting vector, the pivot should be choose in a specific way, for example 100k elements a pivot that is in the middle is ok, but for the rest at this moment I am still investigating.
As for the different data, I am generating them with random, and I test the same values for myqsort and the library version.