CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 3 of 3
  1. #1
    Join Date
    May 2013
    Posts
    3

    Search technique on DNA sequence

    Given a set of DNA string. Find minimum length string S so that, given DNA strings are subsequence of S.

    For example, if given set of string is: {CAGT, ATGC, CGTT, ACGT, AATT} then answer is: 8. Because, ACAGTGCT contains all given DNA as subsequence.

    Given n such DNA string (n <= 8), each of length atmost 5. Find out the least length.

    Sample:
    5
    AATT
    CGTT
    CAGT
    ACGT
    ATGC

    Please solve this
    regards..........Amran

    Output:
    8

  2. #2
    2kaud's Avatar
    2kaud is offline Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    7,824

    Re: Search technique on DNA sequence

    So what progress have you made so far? As this looks like a homework assignment, we can't give you the solution
    (see http://forums.codeguru.com/showthrea...ork-assignment)

    Without coding, how would you solve this problem? How would you do it using pen/paper? First you need to design the program and only when you are happy with that then code the design.

    If you post code you have so far, we'll provide guidance.
    All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!

    C++23 Compiler: Microsoft VS2022 (17.6.5)

  3. #3
    Join Date
    May 2009
    Posts
    2,413

    Re: Search technique on DNA sequence

    Quote Originally Posted by amran08 View Post
    Find out the least length.
    There's another valid sequence that also has 8 letters: ACAGTTGC.

    I found it using a simple heuristic strategy. I cannot guarantee it's optimal thought, but maybe it doesn't matter because it looks more like you have a programming task rather than an optimization task.

    My strategy is to simply select the most frequent first letter of all input sequences and add it to the end of the output sequence. Then the selected letter is removed from all input sequences that starts with it. Everything is repeated until all input sequences have been used up, like,

    AATT
    CGTT
    CAGT
    ACGT
    ATGC

    A is selected because it's the most frequent of all first letters. Add A to output sequence: A. Removal of all first A:s gives,

    ATT
    CGTT
    CAGT
    CGT
    TGC

    C is selected. Output sequence: AC. Removal of C gives,

    ATT
    GTT
    AGT
    GT
    TGC

    A is selected (but G could equally well have been selected because it's as frequent). Output sequence: ACA. Removal of A gives,

    TT
    GTT
    GT
    GT
    TGC

    G is selected. Output sequence: ACAG. Removal of G gives,

    TT
    TT
    T
    T
    TGC

    T is selected. Output sequence: ACAGT. Removal of T gives,

    T
    T
    GC

    T is selected. Output sequence: ACAGTT. Removal of T gives,

    GC

    Final case. Output sequence: ACAGTTGC. Number of letters: 8.
    Last edited by nuzzle; May 10th, 2013 at 06:45 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured