CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 10 of 10
  1. #1
    Join Date
    Jun 2009
    Posts
    89

    find the occurrence of each small string in the larger one

    Given that you have one string of length N and M small strings of length L . How do you efficiently find the occurrence of each small string in the larger one?

    Solution suggested:

    You can just sort the M strings in O(l*Mlog(M)).An additional l figures because comparison function of strings of length l is of complexity O(l).
    Once these M strings are sorted, we can simply do a binary search on them for each of the N-l+1 continuous substrings of big string. The complexity of this search for each such substring is O(l*logM).
    So the complexity of this procedure is O(l*MlogM)+O((N-l+1)*(l*logM)).
    For N>>l this reduces to O(l*MlogM)+O(N*l*log(M).
    This can be reduced to O((M+N)*l*log(M)).


    I believe there is always better solutions... if you have one - just drop and explain it please.


    Thank you

  2. #2
    Join Date
    May 2009
    Posts
    23

    Re: find the occurrence of each small string in the larger one

    Check out Knuth-morris-Pratt scan algorithm. I think it does O(n+L) in worst case scenario.

  3. #3
    Join Date
    Jun 2009
    Posts
    89

    Re: find the occurrence of each small string in the larger one

    I will check that.

    Thank you for the hint :-)

  4. #4
    Join Date
    Jun 2009
    Posts
    19

    Re: find the occurrence of each small string in the larger one

    KMP by itself would not work on this... would be O((N+L)M)...
    A trie (prefix tree) would do for this (O(NL)).

  5. #5
    Join Date
    May 2009
    Posts
    23

    Re: find the occurrence of each small string in the larger one

    Why KMP wont work?
    BTW, Considering the best case in KNP and worst case in trie. Assume M=ABCDEFGH, and we want to search ABCDEFGH, KNP is faster the trie?

  6. #6
    Join Date
    Jun 2009
    Posts
    89

    Re: find the occurrence of each small string in the larger one

    How will you use Trie here??

  7. #7
    Join Date
    May 2009
    Posts
    23

    Re: find the occurrence of each small string in the larger one

    It is based on binary tree as well, u can find the information and algo in wikipidia (seach prefix tree).

  8. #8
    Join Date
    Jun 2009
    Posts
    19

    Re: find the occurrence of each small string in the larger one

    The point is... there are M given short strings!!

  9. #9
    Join Date
    May 2009
    Posts
    23

    Re: find the occurrence of each small string in the larger one

    R u saying m cant be 1 ? How do you calculate O(n+L)m ? I think it depends on how far the distance between the 1st and 2nd occurance in KNP. We dont search from beginning again after find the 1st occurance, but we continue till the end. If the 2nd is next to 1st it still beat trie. If however the next occurance is far, trie might beat KNP.
    Last edited by binyo66; June 30th, 2009 at 08:20 AM.

  10. #10
    Join Date
    Jun 2009
    Posts
    19

    Re: find the occurrence of each small string in the larger one

    There are M small strings. KMP is to find ONE string within another! "M small strings" and NOT "M, a small string".

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured