|
-
February 1st, 2009, 12:58 PM
#1
String matching algorithm
Hi,
The problem I am trying to solve is to find the common sequences in a string. Say I have two strings
aaabbbcccdddeee
aaabbcccdddeeff
I would like to devise a measure for how similar these two strings are. So the strings aaa, ccc, ddd would match and might get scores of 1 each. bb and ee match slightly with bbb and eee so these might get scores of .5 each. ff doesn't match anything. Then we get a total score of 4 for how similar the two strings are. The thing to note is that we're allowed to offset the values so you can see that the ccc and ddd values match even though they are offset by a bit.
The overall context of the problem is for analyzing car drive data. If I have two car drives along streets [A, A, A, B, B, C] (values are repeated because for each lat/lon coordinate in the drive I have a street segment that this point corresponds to) and another one along [A, A, A, D, D, B, B, C], I'd like to say whether these two drives are the same. In this case they're about the same, except we deviate off to drive [D, D] for a bit. I'd like to be able to measure this.
Thanks in advance.
-
February 1st, 2009, 05:25 PM
#2
Re: String matching algorithm
Try searching on fuzzy string matching... Wikipedia gives you this:
http://en.wikipedia.org/wiki/Bitap_algorithm
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|