CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com

# Thread: Search technique on DNA sequence

1. Junior Member
Join Date
May 2013
Posts
3

## Search technique on DNA sequence

Given a set of DNA string. Find minimum length string S so that, given DNA strings are subsequence of S.

For example, if given set of string is: {CAGT, ATGC, CGTT, ACGT, AATT} then answer is: 8. Because, ACAGTGCT contains all given DNA as subsequence.

Given n such DNA string (n <= 8), each of length atmost 5. Find out the least length.

Sample:
5
AATT
CGTT
CAGT
ACGT
ATGC

regards..........Amran

Output:
8

2. ## Re: Search technique on DNA sequence

So what progress have you made so far? As this looks like a homework assignment, we can't give you the solution
(see http://forums.codeguru.com/showthrea...ork-assignment)

Without coding, how would you solve this problem? How would you do it using pen/paper? First you need to design the program and only when you are happy with that then code the design.

If you post code you have so far, we'll provide guidance.

3. Elite Member
Join Date
May 2009
Posts
2,413

## Re: Search technique on DNA sequence

Originally Posted by amran08
Find out the least length.
There's another valid sequence that also has 8 letters: ACAGTTGC.

I found it using a simple heuristic strategy. I cannot guarantee it's optimal thought, but maybe it doesn't matter because it looks more like you have a programming task rather than an optimization task.

My strategy is to simply select the most frequent first letter of all input sequences and add it to the end of the output sequence. Then the selected letter is removed from all input sequences that starts with it. Everything is repeated until all input sequences have been used up, like,

AATT
CGTT
CAGT
ACGT
ATGC

A is selected because it's the most frequent of all first letters. Add A to output sequence: A. Removal of all first A:s gives,

ATT
CGTT
CAGT
CGT
TGC

C is selected. Output sequence: AC. Removal of C gives,

ATT
GTT
AGT
GT
TGC

A is selected (but G could equally well have been selected because it's as frequent). Output sequence: ACA. Removal of A gives,

TT
GTT
GT
GT
TGC

G is selected. Output sequence: ACAG. Removal of G gives,

TT
TT
T
T
TGC

T is selected. Output sequence: ACAGT. Removal of T gives,

T
T
GC

T is selected. Output sequence: ACAGTT. Removal of T gives,

GC

Final case. Output sequence: ACAGTTGC. Number of letters: 8.
Last edited by nuzzle; May 10th, 2013 at 06:45 AM.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•

×