find the occurrence of each small string in the larger one
Given that you have one string of length N and M small strings of length L . How do you efficiently find the occurrence of each small string in the larger one?
Solution suggested:
You can just sort the M strings in O(l*Mlog(M)).An additional l figures because comparison function of strings of length l is of complexity O(l).
Once these M strings are sorted, we can simply do a binary search on them for each of the N-l+1 continuous substrings of big string. The complexity of this search for each such substring is O(l*logM).
So the complexity of this procedure is O(l*MlogM)+O((N-l+1)*(l*logM)).
For N>>l this reduces to O(l*MlogM)+O(N*l*log(M).
This can be reduced to O((M+N)*l*log(M)).
I believe there is always better solutions... if you have one - just drop and explain it please.
Thank you
Re: find the occurrence of each small string in the larger one
Check out Knuth-morris-Pratt scan algorithm. I think it does O(n+L) in worst case scenario.
Re: find the occurrence of each small string in the larger one
I will check that.
Thank you for the hint :-)
Re: find the occurrence of each small string in the larger one
KMP by itself would not work on this... would be O((N+L)M)...
A trie (prefix tree) would do for this (O(NL)).
Re: find the occurrence of each small string in the larger one
Why KMP wont work?
BTW, Considering the best case in KNP and worst case in trie. Assume M=ABCDEFGH, and we want to search ABCDEFGH, KNP is faster the trie?
Re: find the occurrence of each small string in the larger one
How will you use Trie here??
Re: find the occurrence of each small string in the larger one
It is based on binary tree as well, u can find the information and algo in wikipidia (seach prefix tree).
Re: find the occurrence of each small string in the larger one
The point is... there are M given short strings!!
Re: find the occurrence of each small string in the larger one
R u saying m cant be 1 ? How do you calculate O(n+L)m ? I think it depends on how far the distance between the 1st and 2nd occurance in KNP. We dont search from beginning again after find the 1st occurance, but we continue till the end. If the 2nd is next to 1st it still beat trie. If however the next occurance is far, trie might beat KNP.
Re: find the occurrence of each small string in the larger one
There are M small strings. KMP is to find ONE string within another! "M small strings" and NOT "M, a small string".