We have a circular sequance, where the letters 1,...n repeat twice (and letters can be permuted). I am trying to find a efficient algorithm to return the minimal representation of a sequence.
Example:
Say we have a sequence 1 1 2 4 4 3 3 2
I dont distinguish this sequence with the sequence 4 4 3 3 2 1 1 2 (circularity, i cut the first one before 4)
I also don't distinguish the first sequence with 4 4 3 1 1 2 2 3 (permutation 1 -> 4, 2 -> 3, 4 -> 1, 3 -> 2)
I would like an algorithm that converts a sequence to the minimal possible sequence of its kind (minimal when reading left to right,
for example 1 1 2 3 2 3 < 1 2 3 1 2 3, since the 2nd letter is 1 in the first seq. but 2 in the 2nd).
Of course it is possible to do it in quadratic time, since we can just loop through all starting positions and renumerate so the first occurence of a letter is 1, the second is 2, ...
Since I have millions of such sequences (all about 20 - 30 letters long) I would like a more efficient one (say running in linear or n ln(n) time).
Don't need the code, just the idea.
I may be missing something but finding the "minimal" sequence seems trivial. It's just to order the items. With your example it would be,
1 1 2 2 3 3 4 4
What cuber64 actually means is that he/she wants an algorithm to find the canonical form for a given sequence, under the equivalence relation whereby permutations and circular re-orderings are equivalent.
@cuber64: I think the best you could get for this is O(n), as any algorithm will likely have to look at all the letters of the sequence to determine the canonical form. Or at least all except the last one or two letters, as these can be determined by what is left over after looking at the rest.
Not much help, I know, but sometimes having a bound on what you can likely achieve will help you know when to stop looking for a better approach
What cuber64 actually means is that he/she wants an algorithm to find the canonical form for a given sequence, under the equivalence relation whereby permutations and circular re-orderings are equivalent.
To me it looks like he's defined the "canonical form" to be the ordered sequence. It can be reached for any sequence by sorting it. It involves a series of permutations and can be achieved in O(N*lnN) time.
No, it's what Peter_B said. I want to find a cannonical form of a given sequence, which is not just sorting it.
e.g. 1212 != 1122
Anyways, I would be satisfied in any canonical form that would be quick to find, not necessarily the minimal in the sense that it is bigger in the sence of integers, 1212 > 1122.
For the meanwhile, I'll just use the O(n^2) approach, since I can't find a better approach.
Bookmarks