CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 6 of 6

Thread: Sort Huge files

  1. #1
    Join Date
    May 2012
    Location
    Earth!
    Posts
    9

    Sort Huge files

    I am trying to understand what techiques can be used to sort really huge files (larger than available memory). I did some googling and came across one technique.

    1. Are there any better ways to get this done?

    2. Is there some tweaking that can be done to make this itself better?

    large enough so that you get a lot of records, but small enough such that it will comfortably fit into memory
    3. How do you decide on this value? Consider, memory is 4 GB and currently about 2GB is consumed, and file to sort is 10GB in size. (Consumed memory could of course change dynamically during execution - consumed more/less by other apps.)

  2. #2
    Join Date
    Apr 1999
    Posts
    27,449

    Re: Sort Huge files

    Quote Originally Posted by ShaQ.Blogs View Post
    I am trying to understand what techiques can be used to sort really huge files (larger than available memory).
    The technique has been known for decades. It's called an external merge sort:

    http://en.wikipedia.org/wiki/External_sorting

    That site you found describes nothing more than what has been known and used for the past 40+ years.

    Regards,

    Paul McKenzie
    Last edited by Paul McKenzie; May 30th, 2012 at 04:25 AM.

  3. #3
    GCDEF is offline Elite Member Power Poster
    Join Date
    Nov 2003
    Location
    Florida
    Posts
    12,635

    Re: Sort Huge files

    Sounds like it might be time to consider a database.

  4. #4
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: Sort Huge files

    I'd say the big question is: "Do you HAVE to use a comparison based sort?".

    Depending on the elements being sorted, I'm pretty sure a Bucket/Radix sort can beat the crap out of any sized input.
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  5. #5
    Join Date
    Jan 2009
    Posts
    1,689

    Re: Sort Huge files

    I agree that a radix sort would likely be the best bet here. You do not need to sort things in RAM, you can push large amounts of data off to the hard drive. A Radix sort would be useful here because you can put the buckets on the hard drive, then load small buckets into RAM and do a quicksort on those buckets, then just combine them when you are done iterating each bucket.

  6. #6
    Join Date
    May 2012
    Location
    Earth!
    Posts
    9

    Re: Sort Huge files

    Quote Originally Posted by Paul McKenzie View Post
    The technique has been known for decades. It's called an external merge sort:

    http://en.wikipedia.org/wiki/External_sorting

    That site you found describes nothing more than what has been known and used for the past 40+ years.
    Thanks for the link & info

    I'd say the big question is: "Do you HAVE to use a comparison based sort?".

    Depending on the elements being sorted, I'm pretty sure a Bucket/Radix sort can beat the crap out of any sized input.
    I agree that a radix sort would likely be the best bet here. You do not need to sort things in RAM, you can push large amounts of data off to the hard drive. A Radix sort would be useful here because you can put the buckets on the hard drive, then load small buckets into RAM and do a quicksort on those buckets, then just combine them when you are done iterating each bucket.
    I have some reading to do now...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured