Finding Duplicate Files
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 10 of 10

Thread: Finding Duplicate Files

  1. #1
    Join Date
    Feb 2009
    Posts
    32

    Finding Duplicate Files

    Is there an easier way to find duplicate files?

    Code:
    import java.io.*;
    public class Test {
    	public static void main(String[] args) throws Throwable {
    		System.out.print(duplicate(new File("C:\\Users\\Me\\Desktop\\4-01 Living the Dream.avi"),
    			new File("C:\\Users\\Me\\Desktop\\4-01 Living the Dream - Copy.avi")));
    	}
    	public static boolean duplicate(File f1, File f2) throws Throwable {
    		FileInputStream fis1 = new FileInputStream(f1),
    			fis2 = new FileInputStream(f2);
    		byte[] ba1 = new byte[fis1.available()],
    			ba2 = new byte[fis2.available()];
    		
    		if (ba1.length == ba2.length) {
    			fis1.read(ba1);
    			fis2.read(ba2);
    			for (int i = 0; i < ba1.length; i++) {
    				if (ba1[i] != ba2[i]) return false;
    			}
    			return true;
    		}
    		return false;
    		
    	}
    }
    Is this seriously the quickest way? I notice some redundancy when you read the FileInputStream into the byte array, and then check each byte. Is there a way to read the file byte-by-byte and check it on-the-fly? It would replace the byte array, to just one byte variable.
    Last edited by Nim; December 31st, 2009 at 08:45 AM.

  2. #2
    Join Date
    Oct 2008
    Posts
    77

    Re: Finding Duplicate Files

    I doubt your method will work for large files, that cannot fit the memory.

    Doing MD5 hash sounds much more reasonable. Even if there's no pre-existing Java solution, you can find pseudo code.

  3. #3
    Join Date
    Feb 2009
    Posts
    32

    Re: Finding Duplicate Files

    I added the paragraph below my code the minute you posted, in case you didn't catch that. I'm willing to bet that what I've described is in the API. I tried using FileInputStream.read(), but it takes way too long.
    Last edited by Nim; December 31st, 2009 at 09:15 AM.

  4. #4
    Join Date
    Feb 2008
    Posts
    966

    Re: Finding Duplicate Files

    Well, if you are worried about reading in large files, you should probably use the buffered input stream read in (and compare) a line at a time. This way you won't run into the problem of trying to read EVERYTHING into memory all at once.

  5. #5
    Join Date
    Feb 2009
    Posts
    32

    Re: Finding Duplicate Files

    Yeah I did that with FileInputStream.read() and it took a lot longer. I mean, i exited the command prompt because I was tired of waiting.

  6. #6
    Join Date
    Jul 2005
    Location
    Currently in Mexico City
    Posts
    566

    Re: Finding Duplicate Files

    This solves your problem: http://www.twmacinta.com/myjava/fast_md5.php

    It becomes as easy as this:
    Code:
    byte[] hash = MD5.getHash(new File(filename));
    On my cra**y PC in the office (1 CPU core) took me about 2 and half min to calculate hash of a 4.4Gb linux iso image.
    Last edited by Xeel; January 4th, 2010 at 05:30 PM.
    Wanna install linux on a vacuum cleaner. Could anyone tell me which distro sucks better?

    I had a nightmare last night. I was dreaming that Iím 64-bit and my blanket is 32-bit and I couldnít cover myself with it, so Iíve spent the whole night freezing. And in the morning I find that my blanket just had fallen off the bed. =S (from: bash.org.ru)

    //always looking for job opportunities in AU/NZ/US/CA/Europe :P
    willCodeForFood(Arrays.asList("Java","PHP","C++","bash","Assembler","XML","XHTML","CSS","JS","PL/SQL"));

    USE [code] TAGS! Read this FAQ if you are new here. If this post was helpful, please rate it!

  7. #7
    Join Date
    Jan 2012
    Posts
    1

    Re: Finding Duplicate Files

    If you are looking for the solution to the duplicate files or image. You can use following

    utility (http://www.duplicatefilesdeleter.com/). its a good solution to the duplicate files.

  8. #8
    Join Date
    May 2006
    Location
    UK
    Posts
    4,474

    Re: Finding Duplicate Files

    Please don't resurrect old threads. I doubt if the OP is still interested in solving the problem after 2 years.
    Posting code? Use code tags like this: [code]...Your code here...[/code]
    Click here for examples of Java Code

  9. #9
    Join Date
    Feb 2012
    Posts
    2

    Re: Finding Duplicate Files

    I Know About A Free Duplicate File Finder Just Get It On :- http://www.ashisoft.com/

  10. #10
    Join Date
    Jan 2013
    Posts
    1

    Re: Finding Duplicate Files

    i think http://DuplicateFilesDeleter.com will help you better, i tried it and they did a great a job.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center