Results 1 to 15 of 18

Thread: More of a techniques question than a language specific one..

Thread Tools
- Show Printable Version
Display
- Switch to Linear Mode
- Switch to Hybrid Mode
- Threaded Mode

Threaded View

February 16th, 2009, 06:35 AM #1
cjard

View Profile

View Forum Posts

Elite Member Power Poster
Join Date

Oct 2003

Location

.NET2.0 / VS2005 Developer

Posts

7,104
More of a techniques question than a language specific one..

Guys

I'm making a tool to check for duplicate files because I cannot find one that works in a folder-by-folder basis rather than file-by-file.

Right now I have a routine to do the check and it works like:

Build a dictionary of all the file sizes, discard those whose filesize is unique
For each file size, CRC32 the first 16kb of the file and track the number of CRC32s seen, discard uniques
CRC32 the whole file, discard uniques

It takes around 5 mintues to check 420Gb of files. I wondered if you guys would have any ideas of whether an improvement could be made. I've considered replacing the last CRC32 or indeed all CRC32s with byte-by-byte compare using big buffers; Theory is if you have to read the whole file to crc32 it you might as well just run an Nway compare and discard candidates as you go because ultiamtely you will read and process fewer bytes and not have the risk of spurious duplicates.
If disk access could be streamlined so there was less thrashing, that too could help but I don't know if it's possible to work out order-of-access in C# to determine best reading order, or if it would offer a significant boost

Any thoughts?

"it's a fax from your dog, Mr Dansworth. It looks like your cat" - Gary Larson...DW1: Data Walkthroughs 1.1...DW2: Data Walkthroughs 2.0...DDS: The DataSet Designer Surface...ANO: ADO.NET2 Orientation...DAN: Deeper ADO.NET...DNU...PQ

Reply With Quote

Quick Navigation C-Sharp Programming Top

« Previous Thread | Next Thread »

Posting Permissions

You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
[VIDEO] code is On
HTML code is Off

Forum Rules

Click Here to Expand Forum to Full Width

Featured

The Best Reasons to Target Windows 8

* Porting from Android to Windows 8: The Real Story
Do you have an Android application? How hard would it really be to port to Windows 8?
* Guide to Porting Android Applications to Windows 8
If you've already built for Android, learn what do you really need to know to port your application to Windows Phone 8.
* HTML5 Development Center
Our portal for articles, videos, and news on HTML5, CSS3, and JavaScript
* Windows App Gallery
See the Windows 8.x apps we've spotlighted or submit your own app to the gallery!

Thread: More of a techniques question than a language specific one..

Thread Tools

Display

Threaded View

More of a techniques question than a language specific one..

Posting Permissions