April 26th, 2011, 09:37 PM
[RESOLVED] Determining object identities using image recognition
I've written some image analysis software that can determine the basic shape, color, and dimensions (in cm) of what it considers to be the most dominant object in the image.
I've also created a database of objects for the algorithm to choose from:
An example would be where the system detects a black rectangle that is 42cm wide and 26cm in height. In this case, both 'box' and 'backpack' would qualify as correct answers. Are there any good ways to make an educated guess as to which of the two items it could be, such as 75% chance it's a backpack, 25% chance it's a box (possibly based on the fact that boxes have a chance of being 3 different colors and a wider range of sizes, as opposed to the backpack which could only be black)?
Item | Shape | Colors | Width range | Height range
Box | rectangle | brown, black, white | 20-50 cm | 10-30 cm
Basketball | circle | orange | 20-25cm | 20-25 cm
Backpack | rectangle | black | 40-50 cm | 20-30 cm
Other advice is also welcome. I'm having to teach myself about image recognition, so if there are other things I should be trying to pull out of an image, or a different way that I should be going about the database, those comments would also be greatly appreciated!
May 2nd, 2011, 02:13 AM
Re: Determining object identities using image recognition
For anyone who may have come across this via a Google search, or just perusing the forum, there was a pretty good answer given to the same question when I asked it on StackOverflow.
Additionally to recording the range of acceptable sizes for boxes and backpacks, you need to define a probability distribution. Most likely you'd just go with a (2D) normal distribution, then you'd record the mean and a variation instead of the range. Do the same for the shape, color, etc. variables with a suitable probability distribution.
Then generate two data set with a few hundred data points like this:
p_1 = (shape=rectangle, color=black, width=12, height=34)
p_2 = (shape=circle, color=red, width=34, height=11)
For one of the sets, manually classify them as the object that would match the description best. That will become your verification set.
Take the other data set and train a classification algorithm like Fisher's linear discriminant using that data. You obtain a transformation `T` that will maximize the "distance" between the classes (groups of data points representing an object) and minimize the "distance" between the points belonging to the same group.
When your program detects a new object with the properties
o = (shape=rectangle, color=black, width=42, height=26)
you apply the transformation obtained from Fisher's LD and measure the correlation (scalar vector product) to the transformations of the data points you classified as, i.e. calculate `(T*o)*(T*p_backpack)'` and `(T*o)*(T*p_box)'` which relate to the probability that the object o is actually a backpack/a box.
Last edited by blinksumgreen; May 2nd, 2011 at 02:17 AM.
Click Here to Expand Forum to Full Width