-
May 10th, 2013, 03:27 AM
#1
Search technique on DNA sequence
Given a set of DNA string. Find minimum length string S so that, given DNA strings are subsequence of S.
For example, if given set of string is: {CAGT, ATGC, CGTT, ACGT, AATT} then answer is: 8. Because, ACAGTGCT contains all given DNA as subsequence.
Given n such DNA string (n <= 8), each of length atmost 5. Find out the least length.
Sample:
5
AATT
CGTT
CAGT
ACGT
ATGC
Please solve this
regards..........Amran
Output:
8
-
May 10th, 2013, 05:40 AM
#2
Re: Search technique on DNA sequence
So what progress have you made so far? As this looks like a homework assignment, we can't give you the solution
(see http://forums.codeguru.com/showthrea...ork-assignment)
Without coding, how would you solve this problem? How would you do it using pen/paper? First you need to design the program and only when you are happy with that then code the design.
If you post code you have so far, we'll provide guidance.
All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!
C++23 Compiler: Microsoft VS2022 (17.6.5)
-
May 10th, 2013, 06:20 AM
#3
Re: Search technique on DNA sequence
Originally Posted by amran08
Find out the least length.
There's another valid sequence that also has 8 letters: ACAGTTGC.
I found it using a simple heuristic strategy. I cannot guarantee it's optimal thought, but maybe it doesn't matter because it looks more like you have a programming task rather than an optimization task.
My strategy is to simply select the most frequent first letter of all input sequences and add it to the end of the output sequence. Then the selected letter is removed from all input sequences that starts with it. Everything is repeated until all input sequences have been used up, like,
AATT
CGTT
CAGT
ACGT
ATGC
A is selected because it's the most frequent of all first letters. Add A to output sequence: A. Removal of all first A:s gives,
ATT
CGTT
CAGT
CGT
TGC
C is selected. Output sequence: AC. Removal of C gives,
ATT
GTT
AGT
GT
TGC
A is selected (but G could equally well have been selected because it's as frequent). Output sequence: ACA. Removal of A gives,
TT
GTT
GT
GT
TGC
G is selected. Output sequence: ACAG. Removal of G gives,
TT
TT
T
T
TGC
T is selected. Output sequence: ACAGT. Removal of T gives,
T
T
GC
T is selected. Output sequence: ACAGTT. Removal of T gives,
GC
Final case. Output sequence: ACAGTTGC. Number of letters: 8.
Last edited by nuzzle; May 10th, 2013 at 06:45 AM.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|