CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 2 of 2
  1. #1
    Join Date
    Jul 2010
    Posts
    1

    Generating terms based on input rules / common misspellings

    Hello,

    I've been asked to find a method of generating new terms based on a list of old terms, using input rules and common misspellings of terms to generate a new list.

    For example, say we have Procter and Gamble in the list. The new list would contain:

    Procter and Gamble
    Proctor and Gamble
    Proctor and Gambel
    Procter & Gamble
    ....

    Can anyone think of a way to do this (something a single developer can do in a reasonable amount of time, given a project with a ~4 month development cycle)?

    Also, I've been told that software that performs this function exists, so if anyone knows of examples of this, that would be great as well.

    Thanks in advance for any help.

  2. #2
    Join Date
    Oct 2006
    Posts
    616

    Re: Generating terms based on input rules / common misspellings

    It seems that the most straightforward thing to do is keep a dictionary of all known words, with a mapping between each word and it's acronyms (which also have individual entries in the dictionary).
    Generating all possible terms boils down to this relatively simple recursive algorithm:
    Code:
    Input:
    word_list_node listHead = the head node of a word_list [w0, w1, w2, ...] of words (as appearing in the term).
    
    Output:
    a list of world lists (word_list_list)
    
    Algorithm:
    word_list_list GenerateAllTerms(word_list_node listHead)
      word_list_list LL' <-- empty word list list.
      If listHead.next is a valid node(not null), Then:
            //list has more than one word - handle the tail, then add the head
            //word's acronyms to all generated lists.
            LL <-- GenerateAllTerms(listHead.next)
            w <-- listhead.word
            For each word_list L' in LL, Do:
                 For each acronym a of w, Do: //here you perform the dictionary access
                     word_list_node node <-- CreateWordListNode(a)
                     node.next <-- L'.head
                     L'.head <-- node
                     LL'.tail.next  <-- L'
                 End For each
            End For each
      Else If listHead is a valid node:
            //base case - list has only one word - generate lists for all its acronyms.
            w <-- listhead.word
            For each acronym a of w, Do:
                     word_list_node head <-- CreateWordListNode(a)
                     word_list L' <-- CreateWordList(head)
                     LL'.head <-- L'
            End For each
      End If 
      return LL'
    End     
    Regards,
    Zachm

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured