I am currently working on a program that will take an image and compare it with all other images in the current directory and sub-directories. I decided to use Qt for this as Qt has a QImage class with an overloaded == operator. This seems to work well if there are duplicates of the same size.
However, my goal is to not only find duplicate images of the same size, but the same image which may be a different size. To do this, I shrink the larger image to the size of the smaller image using the QImage scaled function.
The concern I have with this is that the scaling isn't done the same way the smaller image may have been scaled. I tested this by shrinking a copy of a picture in Paint, that image does not get deleted. I believe that == is looking for an identical image, so when I do the shrinking (via that scaled function), even if it is of the same picture I shrunk in Paint, the two methods of shrinking are not identical, and so the images appear different.
Does anyone know of a better method I can try? Something that can allow for slight deviations, such as an image being 99% "the same" as another. (I admit here that I will need to define some metric for what "the same" means.) In this case, the method assumes they are "identical" and gets rid of one of them.
The main way I can think of doing this is to go through each image, one pixel at a time. If the two pixels from the different images at the same index are the same, then I can increment a same pixel counter. If this number is 99% (or some other threshold) of the total pixel count, I can make the assumption the images are the same.
I'm far from being an image processing expert but I can imagine that there are quite a few pitfalls when it comes to comparing images. I don't know if you've seen this amazing video? http://www.youtube.com/watch?v=vIFCV2spKtg
Since the method obviously is good in preserving the image during a rescale maybe it's also usable as a base for a method of finding how similar two pictures are?
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.
- Brian W. Kernighan
so, you want an algorithm to determine if two images are identical except for at most a "scale" transformation;
well, I'd say it depends on what is the ultimate goal of your comparison algorithm.
for example, should the comparison emulate what a human would do when asked to perform the same task ? if yes, ideally speaking, you should analyze the psychophysical processes manifested by a reference population of humans and model your algorithm accordingly. In practice, I think this can be as easy as some heuristically defined image matching algorithm ( just google image similarity ... ) essentially borrowed from statistical analisys, or it can be as difficult as a semantic-aware image recognition algorithm ... ( for example, two images showing the character 'a' have more chance to be perceived as the 'same' image with respect to, say, a pair of photographical images ... )
alternatively, should the comparison estimate when a pair of images are the 'rescaled' versions of each other, where 'rescaling' is some unknown yet existing scaling algorithm ? like, say, a forensic program used to prove that a 'suspect' published a rescaled/manipulated version of a copyrighted image ? if yes, then given a list of known scaling algorithm (nearest neighbor,linear/polynomial interpolation, etc...) I'd perform a statistical comparison of each pair of images for each algorithm by computing the probability or likelihood that one is the trasformed of the other.
Identical for images is a really difficult thing to figure out. If you take an image, scale it up, then scale it back down to the original size, the result might not even be the same pixel set as the original. Every image manipulation program (Photoshop, GIMP, OpenGL) likely uses a slightly different algorithm to scale images. For linear transformations, it should always be the same, but most programs will do interpolation, not only will each program do it differently, there are many different algorithms (bilinear, bicubic, all can be affected by the amount of bleeding and intensity of bleeding falloff.)
There are some really nice comparators out there though, google has one, so I'm sure you can find one on their repository. OpenCV also has one that I've used before.
It is always difficult to create the program you may be wishing by simply some notorious algorithms. A commercial image comparison software might probably use a large collection of techniques to achieve that.
If your images are jpegs (or in a similar format), scaling might not be the only problem but also the lossy compression algorithm being used will likely have an even higher impact. Try opening a png image in two different programs and (without doing any processing on it) save it as jpg in each of the program. I bet you the results will not be equal.
More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity. --W.A.Wulf
Premature optimization is the root of all evil --Donald E. Knuth