String matching algorithm

**calmze** · February 1st, 2009, 12:58 PM

Hi,

The problem I am trying to solve is to find the common sequences in a string. Say I have two strings

aaabbbcccdddeee
aaabbcccdddeeff

I would like to devise a measure for how similar these two strings are. So the strings aaa, ccc, ddd would match and might get scores of 1 each. bb and ee match slightly with bbb and eee so these might get scores of .5 each. ff doesn't match anything. Then we get a total score of 4 for how similar the two strings are. The thing to note is that we're allowed to offset the values so you can see that the ccc and ddd values match even though they are offset by a bit.

The overall context of the problem is for analyzing car drive data. If I have two car drives along streets [A, A, A, B, B, C] (values are repeated because for each lat/lon coordinate in the drive I have a street segment that this point corresponds to) and another one along [A, A, A, D, D, B, B, C], I'd like to say whether these two drives are the same. In this case they're about the same, except we deviate off to drive [D, D] for a bit. I'd like to be able to measure this.

Thanks in advance.

**j0nas** · February 1st, 2009, 05:25 PM

Try searching on fuzzy string matching... Wikipedia gives you this:
http://en.wikipedia.org/wiki/Bitap_algorithm

Thread: String matching algorithm

Thread Tools

Display

String matching algorithm

Re: String matching algorithm

Posting Permissions