Search technique on DNA sequence
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 3 of 3

Thread: Search technique on DNA sequence

  1. #1
    Join Date
    May 2013
    Posts
    3

    Search technique on DNA sequence

    Given a set of DNA string. Find minimum length string S so that, given DNA strings are subsequence of S.

    For example, if given set of string is: {CAGT, ATGC, CGTT, ACGT, AATT} then answer is: 8. Because, ACAGTGCT contains all given DNA as subsequence.

    Given n such DNA string (n <= 8), each of length atmost 5. Find out the least length.

    Sample:
    5
    AATT
    CGTT
    CAGT
    ACGT
    ATGC

    Please solve this
    regards..........Amran

    Output:
    8

  2. #2
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,265

    Re: Search technique on DNA sequence

    So what progress have you made so far? As this looks like a homework assignment, we can't give you the solution
    (see http://forums.codeguru.com/showthrea...ork-assignment)

    Without coding, how would you solve this problem? How would you do it using pen/paper? First you need to design the program and only when you are happy with that then code the design.

    If you post code you have so far, we'll provide guidance.
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  3. #3
    Join Date
    May 2009
    Posts
    2,413

    Re: Search technique on DNA sequence

    Quote Originally Posted by amran08 View Post
    Find out the least length.
    There's another valid sequence that also has 8 letters: ACAGTTGC.

    I found it using a simple heuristic strategy. I cannot guarantee it's optimal thought, but maybe it doesn't matter because it looks more like you have a programming task rather than an optimization task.

    My strategy is to simply select the most frequent first letter of all input sequences and add it to the end of the output sequence. Then the selected letter is removed from all input sequences that starts with it. Everything is repeated until all input sequences have been used up, like,

    AATT
    CGTT
    CAGT
    ACGT
    ATGC

    A is selected because it's the most frequent of all first letters. Add A to output sequence: A. Removal of all first A:s gives,

    ATT
    CGTT
    CAGT
    CGT
    TGC

    C is selected. Output sequence: AC. Removal of C gives,

    ATT
    GTT
    AGT
    GT
    TGC

    A is selected (but G could equally well have been selected because it's as frequent). Output sequence: ACA. Removal of A gives,

    TT
    GTT
    GT
    GT
    TGC

    G is selected. Output sequence: ACAG. Removal of G gives,

    TT
    TT
    T
    T
    TGC

    T is selected. Output sequence: ACAGT. Removal of T gives,

    T
    T
    GC

    T is selected. Output sequence: ACAGTT. Removal of T gives,

    GC

    Final case. Output sequence: ACAGTTGC. Number of letters: 8.
    Last edited by nuzzle; May 10th, 2013 at 06:45 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Azure Activities Information Page

Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center