Search technique on DNA sequence
Given a set of DNA string. Find minimum length string S so that, given DNA strings are subsequence of S.
For example, if given set of string is: {CAGT, ATGC, CGTT, ACGT, AATT} then answer is: 8. Because, ACAGTGCT contains all given DNA as subsequence.
Given n such DNA string (n <= 8), each of length atmost 5. Find out the least length.
Sample:
5
AATT
CGTT
CAGT
ACGT
ATGC
Please solve this
regards..........Amran
Output:
8
Re: Search technique on DNA sequence
So what progress have you made so far? As this looks like a homework assignment, we can't give you the solution
(see http://forums.codeguru.com/showthrea...ork-assignment)
Without coding, how would you solve this problem? How would you do it using pen/paper? First you need to design the program and only when you are happy with that then code the design.
If you post code you have so far, we'll provide guidance.
Re: Search technique on DNA sequence
Quote:
Originally Posted by
amran08
Find out the least length.
There's another valid sequence that also has 8 letters: ACAGTTGC.
I found it using a simple heuristic strategy. I cannot guarantee it's optimal thought, but maybe it doesn't matter because it looks more like you have a programming task rather than an optimization task.
My strategy is to simply select the most frequent first letter of all input sequences and add it to the end of the output sequence. Then the selected letter is removed from all input sequences that starts with it. Everything is repeated until all input sequences have been used up, like,
AATT
CGTT
CAGT
ACGT
ATGC
A is selected because it's the most frequent of all first letters. Add A to output sequence: A. Removal of all first A:s gives,
ATT
CGTT
CAGT
CGT
TGC
C is selected. Output sequence: AC. Removal of C gives,
ATT
GTT
AGT
GT
TGC
A is selected (but G could equally well have been selected because it's as frequent). Output sequence: ACA. Removal of A gives,
TT
GTT
GT
GT
TGC
G is selected. Output sequence: ACAG. Removal of G gives,
TT
TT
T
T
TGC
T is selected. Output sequence: ACAGT. Removal of T gives,
T
T
GC
T is selected. Output sequence: ACAGTT. Removal of T gives,
GC
Final case. Output sequence: ACAGTTGC. Number of letters: 8.