Re: Finding Duplicate Files
I doubt your method will work for large files, that cannot fit the memory.
Doing MD5 hash sounds much more reasonable. Even if there's no pre-existing Java solution, you can find pseudo code.
Re: Finding Duplicate Files
I added the paragraph below my code the minute you posted, in case you didn't catch that. I'm willing to bet that what I've described is in the API. I tried using FileInputStream.read(), but it takes way too long.
Re: Finding Duplicate Files
Well, if you are worried about reading in large files, you should probably use the buffered input stream read in (and compare) a line at a time. This way you won't run into the problem of trying to read EVERYTHING into memory all at once.
Re: Finding Duplicate Files
Yeah I did that with FileInputStream.read() and it took a lot longer. I mean, i exited the command prompt because I was tired of waiting.
Re: Finding Duplicate Files
This solves your problem: http://www.twmacinta.com/myjava/fast_md5.php
It becomes as easy as this:
Code:
byte[] hash = MD5.getHash(new File(filename));
On my cra**y PC in the office (1 CPU core) took me about 2 and half min to calculate hash of a 4.4Gb linux iso image.
Re: Finding Duplicate Files
If you are looking for the solution to the duplicate files or image. You can use following
utility (http://www.duplicatefilesdeleter.com/). its a good solution to the duplicate files.
Re: Finding Duplicate Files
Please don't resurrect old threads. I doubt if the OP is still interested in solving the problem after 2 years.
Re: Finding Duplicate Files
Re: Finding Duplicate Files
i think http://DuplicateFilesDeleter.com will help you better, i tried it and they did a great a job.