-
April 12th, 2006, 11:39 PM
#1
Text In Image Recognition?
Hi There,
I'm looking to perform a text recognition function.
Here's what I need:
- I am given an image, of lets say 100x75 pixels in size.
- The image may or may not contain text.
- The text would be a computer font and not handwritte.
- If it does contain text, it is always the same font, colour and text size.
- It may contain up to around 20 characters.
- If text is found, the routine would pass this back to the calling function as a string.
Does anyone know of a free (it's a hobby project! ) library that would do this for me? Failing that, a tutorial of how I might get started into doing this myself?
Many Thanks
BW
Regards,
Big Winston
-
April 13th, 2006, 03:58 AM
#2
Re: Text In Image Recognition?
Once I did a similar thing to automatically pass registration process on one internet site. In many cases it is trivial to do.
1. Ideally you would know in advance a certain characteristic of the text (or may test several alternatives automatically). For instance the text can be lighter/darker, contain a higher proportion of green colour than the background and so on. When you decided upon such a characteristic, then transform the picture in monochrome one based on that criterion (everything lighter than some threshold becomes white, the rest is black). You can always run a set of tests to ensure that your program makes this transformation correctly.
2. The next stage is character separation. You need to determine the groups of connected black pixels in your image and extract the bounding rectangle into a small image which will contain only one character.
3. You need to precompute template sets of characters for all fonts and all font sizes (in my case the was only one font, one font size and only numbers were used which seriously simplified my task).
4. Finally you need to compare an image of every character you extracted with every image in you database of template images. Basically you will need to compute the ratio of coinciding (the number of black pixels which coincides to the total number of black pixels in the tested image. The template which has got the highest ratio is likely to be the character drawn in the picture.
It is a great fun to improve this algorithm and tune it to your particular problem, see how it can recognise higher and higher proportion of images.
In my case I managed to increase the proportion of correctly recognised images from 5% to 95-97% in a couple of weeks.
-
April 13th, 2006, 04:51 AM
#3
Re: Text In Image Recognition?
Really nice explanation DragForce!
I have never work on OCR algorithms, but I have a note on the final step of the ones you described (the 4th): the one that has to do with the character classification. Well, another, more complicated, but maybe better, feature than the ratio of coinciding would be boundary statistics.
In specific, one could use boundary tracking algorithms to estimate the boundary of a character. Afterwards, you could use a chain-code to describe this boundary. Finally a statistic (e.g. 2nd central moment) could be used to describe the specific boundary. This statistic should then be compared to the statistics of the training characters to classify the corresponding character.
-
April 13th, 2006, 04:56 AM
#4
Re: Text In Image Recognition?
Sth else: if you decide to use image (or boundary) statistics, it would be better to use Scale invariant moments or even better Hu invariant moments, which are invariant under both rotation and scaling. This would make the system more indepedent to the font....
-
May 18th, 2009, 08:20 AM
#5
Re: Text In Image Recognition?
Hi,
I am new to this, I want to recognize telugu text from scanned Image.
I want to develop this in C#.NET, Please guide me how to accomplish this..
Its very urgent...Please help me..
Thanks in advance.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|