November 24th, 2009, 04:06 AM
Parsing out geographic names
I need help on figuring out the correct approach to parse out the names of geographic entities from a web search query.
I've been reading various papers on the subject and it seems that the approach I need to take is a dictionary/gazeteer approach.
Basically what researchers on the subject do is that they take the query and look for candidate entries in the dictionary that match any substring of one or more words included in the query.
Then they select the dictionary entry that has the longest number of words.
So my question is basically: How do I implement the above parsing algorithm in an efficient manner?
How do I find the dictionary entry that best matches any words sub-sequence of my original string?
Many thanks in advance for any suggestion or help.
Click Here to Expand Forum to Full Width
This is a CodeGuru survey question.