-
July 16th, 2010, 10:21 AM
#1
Generating terms based on input rules / common misspellings
Hello,
I've been asked to find a method of generating new terms based on a list of old terms, using input rules and common misspellings of terms to generate a new list.
For example, say we have Procter and Gamble in the list. The new list would contain:
Procter and Gamble
Proctor and Gamble
Proctor and Gambel
Procter & Gamble
....
Can anyone think of a way to do this (something a single developer can do in a reasonable amount of time, given a project with a ~4 month development cycle)?
Also, I've been told that software that performs this function exists, so if anyone knows of examples of this, that would be great as well.
Thanks in advance for any help.
-
July 20th, 2010, 03:55 AM
#2
Re: Generating terms based on input rules / common misspellings
It seems that the most straightforward thing to do is keep a dictionary of all known words, with a mapping between each word and it's acronyms (which also have individual entries in the dictionary).
Generating all possible terms boils down to this relatively simple recursive algorithm:
Code:
Input:
word_list_node listHead = the head node of a word_list [w0, w1, w2, ...] of words (as appearing in the term).
Output:
a list of world lists (word_list_list)
Algorithm:
word_list_list GenerateAllTerms(word_list_node listHead)
word_list_list LL' <-- empty word list list.
If listHead.next is a valid node(not null), Then:
//list has more than one word - handle the tail, then add the head
//word's acronyms to all generated lists.
LL <-- GenerateAllTerms(listHead.next)
w <-- listhead.word
For each word_list L' in LL, Do:
For each acronym a of w, Do: //here you perform the dictionary access
word_list_node node <-- CreateWordListNode(a)
node.next <-- L'.head
L'.head <-- node
LL'.tail.next <-- L'
End For each
End For each
Else If listHead is a valid node:
//base case - list has only one word - generate lists for all its acronyms.
w <-- listhead.word
For each acronym a of w, Do:
word_list_node head <-- CreateWordListNode(a)
word_list L' <-- CreateWordList(head)
LL'.head <-- L'
End For each
End If
return LL'
End
Regards,
Zachm
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|