|
-
June 17th, 2009, 09:20 AM
#1
find the occurrence of each small string in the larger one
Given that you have one string of length N and M small strings of length L . How do you efficiently find the occurrence of each small string in the larger one?
Solution suggested:
You can just sort the M strings in O(l*Mlog(M)).An additional l figures because comparison function of strings of length l is of complexity O(l).
Once these M strings are sorted, we can simply do a binary search on them for each of the N-l+1 continuous substrings of big string. The complexity of this search for each such substring is O(l*logM).
So the complexity of this procedure is O(l*MlogM)+O((N-l+1)*(l*logM)).
For N>>l this reduces to O(l*MlogM)+O(N*l*log(M).
This can be reduced to O((M+N)*l*log(M)).
I believe there is always better solutions... if you have one - just drop and explain it please.
Thank you
-
June 29th, 2009, 04:03 PM
#2
Re: find the occurrence of each small string in the larger one
Check out Knuth-morris-Pratt scan algorithm. I think it does O(n+L) in worst case scenario.
-
June 29th, 2009, 04:16 PM
#3
Re: find the occurrence of each small string in the larger one
I will check that.
Thank you for the hint :-)
-
June 30th, 2009, 12:57 AM
#4
Re: find the occurrence of each small string in the larger one
KMP by itself would not work on this... would be O((N+L)M)...
A trie (prefix tree) would do for this (O(NL)).
-
June 30th, 2009, 03:47 AM
#5
Re: find the occurrence of each small string in the larger one
Why KMP wont work?
BTW, Considering the best case in KNP and worst case in trie. Assume M=ABCDEFGH, and we want to search ABCDEFGH, KNP is faster the trie?
-
June 30th, 2009, 04:17 AM
#6
Re: find the occurrence of each small string in the larger one
How will you use Trie here??
-
June 30th, 2009, 05:06 AM
#7
Re: find the occurrence of each small string in the larger one
It is based on binary tree as well, u can find the information and algo in wikipidia (seach prefix tree).
-
June 30th, 2009, 05:14 AM
#8
Re: find the occurrence of each small string in the larger one
The point is... there are M given short strings!!
-
June 30th, 2009, 08:07 AM
#9
Re: find the occurrence of each small string in the larger one
R u saying m cant be 1 ? How do you calculate O(n+L)m ? I think it depends on how far the distance between the 1st and 2nd occurance in KNP. We dont search from beginning again after find the 1st occurance, but we continue till the end. If the 2nd is next to 1st it still beat trie. If however the next occurance is far, trie might beat KNP.
Last edited by binyo66; June 30th, 2009 at 08:20 AM.
-
July 1st, 2009, 06:13 AM
#10
Re: find the occurrence of each small string in the larger one
There are M small strings. KMP is to find ONE string within another! "M small strings" and NOT "M, a small string".
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|