-
December 31st, 2009, 08:59 AM
#1
Finding Duplicate Files
Is there an easier way to find duplicate files?
Code:
import java.io.*;
public class Test {
public static void main(String[] args) throws Throwable {
System.out.print(duplicate(new File("C:\\Users\\Me\\Desktop\\4-01 Living the Dream.avi"),
new File("C:\\Users\\Me\\Desktop\\4-01 Living the Dream - Copy.avi")));
}
public static boolean duplicate(File f1, File f2) throws Throwable {
FileInputStream fis1 = new FileInputStream(f1),
fis2 = new FileInputStream(f2);
byte[] ba1 = new byte[fis1.available()],
ba2 = new byte[fis2.available()];
if (ba1.length == ba2.length) {
fis1.read(ba1);
fis2.read(ba2);
for (int i = 0; i < ba1.length; i++) {
if (ba1[i] != ba2[i]) return false;
}
return true;
}
return false;
}
}
Is this seriously the quickest way? I notice some redundancy when you read the FileInputStream into the byte array, and then check each byte. Is there a way to read the file byte-by-byte and check it on-the-fly? It would replace the byte array, to just one byte variable.
Last edited by Nim; December 31st, 2009 at 09:45 AM.
-
December 31st, 2009, 09:31 AM
#2
Re: Finding Duplicate Files
I doubt your method will work for large files, that cannot fit the memory.
Doing MD5 hash sounds much more reasonable. Even if there's no pre-existing Java solution, you can find pseudo code.
-
December 31st, 2009, 09:44 AM
#3
Re: Finding Duplicate Files
I added the paragraph below my code the minute you posted, in case you didn't catch that. I'm willing to bet that what I've described is in the API. I tried using FileInputStream.read(), but it takes way too long.
Last edited by Nim; December 31st, 2009 at 10:15 AM.
-
December 31st, 2009, 12:02 PM
#4
Re: Finding Duplicate Files
Well, if you are worried about reading in large files, you should probably use the buffered input stream read in (and compare) a line at a time. This way you won't run into the problem of trying to read EVERYTHING into memory all at once.
-
December 31st, 2009, 10:50 PM
#5
Re: Finding Duplicate Files
Yeah I did that with FileInputStream.read() and it took a lot longer. I mean, i exited the command prompt because I was tired of waiting.
-
January 4th, 2010, 06:14 PM
#6
Re: Finding Duplicate Files
This solves your problem: http://www.twmacinta.com/myjava/fast_md5.php
It becomes as easy as this:
Code:
byte[] hash = MD5.getHash(new File(filename));
On my cra**y PC in the office (1 CPU core) took me about 2 and half min to calculate hash of a 4.4Gb linux iso image.
Last edited by Xeel; January 4th, 2010 at 06:30 PM.
Wanna install linux on a vacuum cleaner. Could anyone tell me which distro sucks better?
I had a nightmare last night. I was dreaming that I’m 64-bit and my blanket is 32-bit and I couldn’t cover myself with it, so I’ve spent the whole night freezing. And in the morning I find that my blanket just had fallen off the bed. =S (from: bash.org.ru)
//always looking for job opportunities in AU/NZ/US/CA/Europe :P
willCodeForFood(Arrays.asList("Java","PHP","C++","bash","Assembler","XML","XHTML","CSS","JS","PL/SQL"));
USE [code] TAGS! Read this FAQ if you are new here. If this post was helpful, please rate it!
-
January 13th, 2012, 03:58 AM
#7
Re: Finding Duplicate Files
If you are looking for the solution to the duplicate files or image. You can use following
utility (http://www.duplicatefilesdeleter.com/). its a good solution to the duplicate files.
-
January 13th, 2012, 06:45 AM
#8
Re: Finding Duplicate Files
Please don't resurrect old threads. I doubt if the OP is still interested in solving the problem after 2 years.
-
February 9th, 2012, 10:04 AM
#9
Re: Finding Duplicate Files
-
January 31st, 2013, 11:44 AM
#10
Re: Finding Duplicate Files
i think http://DuplicateFilesDeleter.com will help you better, i tried it and they did a great a job.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|