Remove Duplicate Characters from a String without creating a new one ?
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 22

Thread: Remove Duplicate Characters from a String without creating a new one ?

  1. #1
    Join Date
    Nov 2002
    Posts
    9

    Remove Duplicate Characters from a String without creating a new one ?

    Hello,

    I have an interview with microsoft and they asked this question to like 3 other people so far.

    How do you remove similar characters from a string without creating a new string and also making sure that it runs fast ? (not using a loop inside a loop and checking a million times)

    What I mean is that how can you just remove duplicate characters and some how rearrange the string ?

    "HELLO" should like like "HELO"
    but you are allowed to create only one string or not declare another


    Thanks,
    JIgar

    if someone can think of something then please reply or email me asap.

    The problem is the same if u were to think of removing duplicates from an array without really creating another and without checking it with the unique array elements everytime.

  2. #2
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    Is it only for duplicates that are consecutive or can they appear anywhere in the string ?
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  3. #3
    Join Date
    Nov 2002
    Posts
    9
    They repeated characters can appear anywhere in the string




    Also, what I am looking for is more for an algorithm for removing duplicates from an array which contains(anything...may be characters, integers).


    Thanks for Quick Reply....

    Ya so the duplicate int, char can appear again anywhere in the array


    Only condition is that no declaration of another array and that it has to run fast...I guess O(n) cuz another friend was rejected cuz his algo wasn't optimized

    Jigar

  4. #4
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    If you are allowed to change the order of the elements then it's easy. You do a quicksort so that all duplicates will appear one after the other. Quicksort doesn't create a temporary array so you are not violing any condition. Its execution time is actually O(n log n), so you might run into trouble for that. And removing duplicates from a sorted container is pretty easy.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  5. #5
    Join Date
    Sep 2002
    Posts
    1,747

    ????

    Can the string you are working with ever be lengthened? I am not exactly sure about your question... Are you saying this:
    • You are given a string (mystring).
    • You must process that string to leave only the first instance of a character in.
    • You cannot create any temporary duplicates, not even of characters, in any of your routines.
    • You can not make any other structures in memory (like a list of delete points or even one int to work with as a comparison counter).
    • And it must be optimized fully (with proof?).

    This seems impossible if you are not allowed to lengthen the string, but I will have to attempt a proof...

    Or am I assuming too much restriction? The line
    but you are allowed to create only one string or not declare another
    has me a little confused...
    */*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/

    "It's hard to believe in something you don't understand." -- the sidhi X-files episode

    galathaea: prankster, fablist, magician, liar

  6. #6
    Join Date
    Nov 2002
    Posts
    9
    Well I am looking for an algorithm where you dont create another temporary String for storing the unique characters so far.

    Like this is not acceptable


    string remove(string mystring)
    {
    string temp;


    //insert first element from mystring to temp

    // compare second element of mystring with every element of temp and if not in temp add it to temp
    // repeat for next element of mystring

    // return temp

    }


    Using pointers is there a way so that you can get a linked list or something of the valid unique characters and then in the end assign that linked list to our original string ?

    Or even if you can give me an optimized algorithm to remove duplicates from an array without declaring another array then thats great too...


    Thanks,
    JIgar


    I am sorry I didn't really explain the question well...I just got this question from 2 students who had their interview and so I personally was never asked the question, but from my earlier posts it appears that I need a memory efficient and optimized algo for removing duplicates from a data structure...

  7. #7
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    What is wrong with my idea then galathea ? It does not create any temporary characters nor temporary strings neither lengthens the original string.

    I guess that that is actually a decent solution to the problem, since it can be argued that it will be useless to have the result string ordered in the same way as the first unique characters in the original string.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  8. #8
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    I guess this is one of these interview questions to weed out the
    1. really unsuitable applicants -> can't get an algorithm for this problem
    2. unsuitable -> get an algorithm but it's slow and inefficient
    3. suitable -> get a "standard good" algorithm. This shows they know good basics
    4. highly desireable -> get a "non-standard good" or "optimized standard" algorithm.

    According to my own list I would fall into category 3 right now, but there is surely an algorithm for category 4. I'm thinking along the lines of a in-place sorting algorithm that eliminates duplicates as a side-effect.

    But by the way, you'll never be able to achieve O(n) this much is clear. It seems to me the problem is another of these O(n log n) ones. Too bad I don't have my algorithms books here, I'd have checked
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  9. #9
    Join Date
    Sep 2002
    Posts
    1,747

    re: Yves

    Nothing wrong with your analysis on my end. I'd choose it. But the recursion stores temporary values...
    */*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/

    "It's hard to believe in something you don't understand." -- the sidhi X-files episode

    galathaea: prankster, fablist, magician, liar

  10. #10
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    Yes, well, but the point would be that you don't store temporary entries of the string. Here it's a string, but in real life it could be big objects that are slow to copy.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  11. #11
    Join Date
    Sep 2002
    Posts
    1,747

    Absolutely

    I took this as a challenge question, but I think your analysis is right. It looks like they may have asked an easier question than I made it out. I don't mind a temporary pivot index here or there for the speed. I actually started thinking back to some of the parallel sorting algorithms using many threads... but now I think I understand. Maybe they are just seeing how well one can answer a particular problem effectively without getting sidetracked in the minutiae of interpretations. I'd fail that one...
    */*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/

    "It's hard to believe in something you don't understand." -- the sidhi X-files episode

    galathaea: prankster, fablist, magician, liar

  12. #12
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    Actually, I just checked and there is an implementation without recursion. It uses heapsort as a basis so it is also O(n log n). Unfortunately in practice heapsort is slower than quicksort so the quick and dirty might actually be better for most applications.

    [edit : arg !! I didn't check whether it could be done in-place :/]

    I took this as a challenge question
    Yes well, but they wouldn't make it an impossible now would they ?
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  13. #13
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    Yep, I got my heapsort without duplicates working. Its run-time is actually O(n log n), but since it's a heapsort it's a bit slow. It uses neither recursion nor a temporary array. Here is the code:
    Code:
    class Heap
    {
    public:
      Heap(int n = 15)
      {
        m_Array = new int[n];
        m_max = n;
        m_nSwaps = 0;
        m_nCompares = 0;
        m_nUnique = 0;
      }
    
      ~Heap()
      {
        delete [] m_Array;
      }
    
      void Swap(int i, int k)
      {
        int tmp;
    
        m_nSwaps++;
        tmp = m_Array[i];
        m_Array[i] = m_Array[k];
        m_Array[k] = tmp;
      }
    
      int Compare(int i, int k)
      {
        m_nCompares++;
        return (m_Array[i] - m_Array[k]);
      }
    
      void EnforceHeapProperty(int curr, int max)
      {
        int next, i;
    
        i = curr;
        while (i < (max / 2)) {
          next = i * 2 + 1;
          if (next < (max - 1)) { // two children
            if (Compare(next, next + 1) < 0) { // right is bigger
              if (Compare(next + 1, i) <= 0) { // current is bigger
                return;
              } else { // right is bigger than current
                Swap(next + 1, i);
                i = next + 1;
              }
            } else { // left is bigger
              if (Compare(next, i) <= 0) { // current is bigger
                return;
              } else { // left is bigger than current
                Swap(next, i);
                i = next;
              }
            }
          } else { // one child
            if (Compare(next, i) > 0) { // left is bigger than current
              Swap(next, i);
            }
            return;
          }
        }
      }
    
      void ConstructHeap()
      {
        int i;
    
        for (i = (m_max / 2) - 1; i >= 0; i--) {
          EnforceHeapProperty(i, m_max);
        }
      }
    
      void HeapSortUnique()
      {
        int i, last;
    
        ConstructHeap();
        // Put the first element in the Unique range
        Swap(0, m_max - 1);
        EnforceHeapProperty(0, m_max - 1);
        last = m_max - 1;
        m_nUnique = 1;
        for (i = m_max - 2; i > 0; i--) {
          // Check whether the currently biggest is equal to the last one
          // in the unique range
          if (Compare(0, last) != 0) {
            last--;
            m_Array[last] = m_Array[0];
            m_nUnique++;
          }
          Swap(0, i);
          EnforceHeapProperty(0, i);
        }
        if (Compare(0, last) != 0) {
          last--;
          m_Array[last] = m_Array[0];
          m_nUnique++;
        }
      }
    
      void Randomize()
      {
        int i;
    
        for (i = 0; i < m_max; i++) {
          m_Array[i] = (rand() * 10) / RAND_MAX;
        }
      }
    
      int m_nUnique;
      int *m_Array;
      int m_max;
      int m_nSwaps;
      int m_nCompares;
    };
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  14. #14
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    I just realized that STL comes with heap functions. So here is a simpler program:
    Code:
    #include <algorithm>
    
    using namespace std;
    
    int main(void)
    {
      int array&#091;10&#093; = {6,5,4,9,2,6,5,4,8,10};
      int i, last;
    
      make_heap(array, array + 10);
      last = 9;
      for (i = 10; i > 0; i--) {
        pop_heap(array, array + i);
        if (array&#091;i - 1&#093; != array&#091;last&#093;) {
          last--;
          array&#091;last&#093; = array&#091;i - 1&#093;;
        }
      }
      for (i = last; i < 10; i++) {
        printf("%4d", array&#091;i&#093;);
      }
      return 0;
    }
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  15. #15
    Join Date
    May 2002
    Location
    Texas
    Posts
    222
    Do you think this example applies to the question?

    Please mind that I'm just a beginner and this took me some time to do.

    It will remove any duplicate data that is exclusive of zero and -1. You should be able to use int in place of char.

    The first pointer (a) advances through the list twice. Once to find duplicates. The second time to remark the NULL terminator. The second pointer (b) first changes duplicates to -1. Then pointer (b) will fill in the "holes" and move legal data forward.
    Code:
    // NoDups.cpp
    #include <iostream>
    using namespace std;
    
    int main()
    {
        char str[] = "The quick brown Thqukwn 0123456789... 6543";
        char *a;
        char *b;
    
        a = str;
    
        cout << str << endl;
        while(*a)
        {
            if(*a == -1)
            {
                b = a;
                b++;
                while(*b)
                {
                    if(*b == -1) 
                        b++;
                    else 
                    {
                        *a = *b;
                        break;
                    }
                }
            }
            b = a;
            b++;
            while(*b)
            {
                if(*b == *a)
                    *b = -1;
                b++;
            }
            a++;
        }
        
        a = str;
    
        while((*a != -1) && (*a != 0))
            a++;
        *a = 0; 
        cout << str << endl;
        return 0;
    }

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Azure Activities Information Page

Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center