CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 8 of 8
  1. #1
    Join Date
    Dec 2007
    Posts
    76

    C++ File Comparison

    I'm not sure if this is a language question, or a concept question, but I feel it's kind of both. [b]What do you think would be the best way to compare 2 separate files in c++[b], would md5'ing both of them and then comparing the md5 sums be best? what about if there were 100 files that had to be compared would that be an acceptable way to do it? Is there another easy way to compare files? these wouldn't necessarily be text files, so I can't think of a good way do do it. Thanks in advance.

  2. #2
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: C++ File Comparison

    Quote Originally Posted by HKothari View Post
    I'm not sure if this is a language question, or a concept question, but I feel it's kind of both. [b]What do you think would be the best way to compare 2 separate files in c++[b], would md5'ing both of them and then comparing the md5 sums be best? what about if there were 100 files that had to be compared would that be an acceptable way to do it? Is there another easy way to compare files? these wouldn't necessarily be text files, so I can't think of a good way do do it. Thanks in advance.
    There is only one true way to verify if two files are the same or not. Compare them byte by byte.

    MD5 wouldn't really help you here as:
    - MD5 can false positive (two different files can return true)
    - You'd have to read the entire files to create the MD5s anyways

    The advantage of MD5 is that you can have something to compare to if you don't have the original file. In this case you have both. Using byte comparison is not only better, but actually faster and more efficient.

    What if you have 100 files? Well, I guess you can compare them all to the first file.

  3. #3
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: C++ File Comparison

    No, it's not better to use byte-for-byte comparison because the you cannot compare 100 files with each other.
    The idea of computing the MD5 for each file and then comparing those is better. The only thing to note is that indeed, the MD5 can be equal even though the files are not. So when the MD5 is equal you still have to do byte-for-byte comparisons for these two files.

    Without using MD5, for 100 files, you would have to do 4950 byte of byte comparisons of files. When each file is 1 MB, this is nearly 5GB of data to read from the disk. If you use MD5, you'll only have to read 100MB in practice.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  4. #4
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: C++ File Comparison

    Quote Originally Posted by Yves M View Post
    Without using MD5, for 100 files, you would have to do 4950 byte of byte comparisons of files. When each file is 1 MB, this is nearly 5GB of data to read from the disk. If you use MD5, you'll only have to read 100MB in practice.
    No, if fileA==fileB and fileA==fileC, then you don't have to check fileA==fileC. No need to check each file with each other.
    If fileC!=fileA, then you discard fileC.

    MD5 will read just as much data from the disc as byte for byte comparison. And then it'll do un-necessary calculations.
    Not to mention that if two files have different sizes, then there are 0 bytes read...

    There are plenty of good uses for MD5, but to answer the question "are my two files the same", using MD5 is not necessary. Even if there are hundreds of files.

    The question was "Are my hundred files the same". This only requires a simple Boolean answer. Yes or No.

    If the question was "I have 100 file, and I want to know which ones are equal" or anything else in that flavor, then I would agree 100%. MD5 or any other hash would be the way to go! (and then, byte for byte comparison would be a horrible choice)

  5. #5
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: C++ File Comparison

    If the question was "I have 100 file, and I want to know which ones are equal" or anything else in that flavor, then I would agree 100%.
    That's exactly how I read his question.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  6. #6
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: C++ File Comparison

    Quote Originally Posted by Yves M View Post
    That's exactly how I read his question.
    Well I then I agree with you. I guess we didn't answer the same question.

  7. #7
    Join Date
    Dec 2007
    Posts
    76

    Re: C++ File Comparison

    I'm sorry for the confusion, let me clarify, it won't be 1 file with around 100 duplicates, it would be more like 50 files with 1 duplicate each or something around those, thinking about it now, I don't think md5 would be a viable option because these files could very well be several megabytes each. Is there anywhere I should look for how to compare files byte for byte?

  8. #8
    Join Date
    May 2007
    Location
    Scotland
    Posts
    1,164

    Re: C++ File Comparison

    Check the file sizes and only compare files byte for byte when you have a match in file size.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured