Remove Duplicate Characters from a String without creating a new one ?

**Jigar111** · November 7th, 2002, 06:28 PM

Hello,

I have an interview with microsoft and they asked this question to like 3 other people so far.

How do you remove similar characters from a string without creating a new string and also making sure that it runs fast ? (not using a loop inside a loop and checking a million times)

What I mean is that how can you just remove duplicate characters and some how rearrange the string ?

"HELLO" should like like "HELO"
but you are allowed to create only one string or not declare another

Thanks,
JIgar

if someone can think of something then please reply or email me asap.

The problem is the same if u were to think of removing duplicates from an array without really creating another and without checking it with the unique array elements everytime.

**Yves M** · November 7th, 2002, 06:31 PM

Is it only for duplicates that are consecutive or can they appear anywhere in the string ?

**Jigar111** · November 7th, 2002, 06:44 PM

They repeated characters can appear anywhere in the string

Also, what I am looking for is more for an algorithm for removing duplicates from an array which contains(anything...may be characters, integers).

Thanks for Quick Reply....

Ya so the duplicate int, char can appear again anywhere in the array

Only condition is that no declaration of another array and that it has to run fast...I guess O(n) cuz another friend was rejected cuz his algo wasn't optimized

Jigar

**Yves M** · November 7th, 2002, 06:49 PM

If you are allowed to change the order of the elements then it's easy. You do a quicksort so that all duplicates will appear one after the other. Quicksort doesn't create a temporary array so you are not violing any condition. Its execution time is actually O(n log n), so you might run into trouble for that. And removing duplicates from a sorted container is pretty easy.

**galathaea** · November 7th, 2002, 07:02 PM

Can the string you are working with ever be lengthened? I am not exactly sure about your question... Are you saying this:

You are given a string (mystring).
You must process that string to leave only the first instance of a character in.
You cannot create any temporary duplicates, not even of characters, in any of your routines.
You can not make any other structures in memory (like a list of delete points or even one int to work with as a comparison counter).
And it must be optimized fully (with proof?).

This seems impossible if you are not allowed to lengthen the string, but I will have to attempt a proof...

Or am I assuming too much restriction? The line

but you are allowed to create only one string or not declare another

has me a little confused...

**Jigar111** · November 7th, 2002, 07:13 PM

Well I am looking for an algorithm where you dont create another temporary String for storing the unique characters so far.

Like this is not acceptable

string remove(string mystring)
{
string temp;

//insert first element from mystring to temp

// compare second element of mystring with every element of temp and if not in temp add it to temp
// repeat for next element of mystring

// return temp

}

Using pointers is there a way so that you can get a linked list or something of the valid unique characters and then in the end assign that linked list to our original string ?

Or even if you can give me an optimized algorithm to remove duplicates from an array without declaring another array then thats great too...

Thanks,
JIgar

I am sorry I didn't really explain the question well...I just got this question from 2 students who had their interview and so I personally was never asked the question, but from my earlier posts it appears that I need a memory efficient and optimized algo for removing duplicates from a data structure...

**Yves M** · November 7th, 2002, 07:16 PM

What is wrong with my idea then galathea ? It does not create any temporary characters nor temporary strings neither lengthens the original string.

I guess that that is actually a decent solution to the problem, since it can be argued that it will be useless to have the result string ordered in the same way as the first unique characters in the original string.

**Yves M** · November 7th, 2002, 07:22 PM

I guess this is one of these interview questions to weed out the

really unsuitable applicants -> can't get an algorithm for this problem
unsuitable -> get an algorithm but it's slow and inefficient
suitable -> get a "standard good" algorithm. This shows they know good basics
highly desireable -> get a "non-standard good" or "optimized standard" algorithm.

According to my own list I would fall into category 3 right now, but there is surely an algorithm for category 4. I'm thinking along the lines of a in-place sorting algorithm that eliminates duplicates as a side-effect.

But by the way, you'll never be able to achieve O(n) this much is clear. It seems to me the problem is another of these O(n log n) ones. Too bad I don't have my algorithms books here, I'd have checked

**galathaea** · November 7th, 2002, 07:26 PM

Nothing wrong with your analysis on my end. I'd choose it. But the recursion stores temporary values...

**Yves M** · November 7th, 2002, 07:28 PM

Yes, well, but the point would be that you don't store temporary entries of the string. Here it's a string, but in real life it could be big objects that are slow to copy.

**galathaea** · November 7th, 2002, 07:50 PM

I took this as a challenge question, but I think your analysis is right. It looks like they may have asked an easier question than I made it out. I don't mind a temporary pivot index here or there for the speed. I actually started thinking back to some of the parallel sorting algorithms using many threads... but now I think I understand. Maybe they are just seeing how well one can answer a particular problem effectively without getting sidetracked in the minutiae of interpretations. I'd fail that one...

**Yves M** · November 7th, 2002, 07:59 PM

Actually, I just checked and there is an implementation without recursion. It uses heapsort as a basis so it is also O(n log n). Unfortunately in practice heapsort is slower than quicksort so the quick and dirty might actually be better for most applications.

[edit : arg !! I didn't check whether it could be done in-place :/]

I took this as a challenge question

Yes well, but they wouldn't make it an impossible now would they ?

**Yves M** · November 8th, 2002, 08:02 PM

Yep, I got my heapsort without duplicates working. Its run-time is actually O(n log n), but since it's a heapsort it's a bit slow. It uses neither recursion nor a temporary array. Here is the code:

Code:

class Heap
{
public:
  Heap(int n = 15)
  {
    m_Array = new int[n];
    m_max = n;
    m_nSwaps = 0;
    m_nCompares = 0;
    m_nUnique = 0;
  }

  ~Heap()
  {
    delete [] m_Array;
  }

  void Swap(int i, int k)
  {
    int tmp;

    m_nSwaps++;
    tmp = m_Array[i];
    m_Array[i] = m_Array[k];
    m_Array[k] = tmp;
  }

  int Compare(int i, int k)
  {
    m_nCompares++;
    return (m_Array[i] - m_Array[k]);
  }

  void EnforceHeapProperty(int curr, int max)
  {
    int next, i;

    i = curr;
    while (i < (max / 2)) {
      next = i * 2 + 1;
      if (next < (max - 1)) { // two children
        if (Compare(next, next + 1) < 0) { // right is bigger
          if (Compare(next + 1, i) <= 0) { // current is bigger
            return;
          } else { // right is bigger than current
            Swap(next + 1, i);
            i = next + 1;
          }
        } else { // left is bigger
          if (Compare(next, i) <= 0) { // current is bigger
            return;
          } else { // left is bigger than current
            Swap(next, i);
            i = next;
          }
        }
      } else { // one child
        if (Compare(next, i) > 0) { // left is bigger than current
          Swap(next, i);
        }
        return;
      }
    }
  }

  void ConstructHeap()
  {
    int i;

    for (i = (m_max / 2) - 1; i >= 0; i--) {
      EnforceHeapProperty(i, m_max);
    }
  }

  void HeapSortUnique()
  {
    int i, last;

    ConstructHeap();
    // Put the first element in the Unique range
    Swap(0, m_max - 1);
    EnforceHeapProperty(0, m_max - 1);
    last = m_max - 1;
    m_nUnique = 1;
    for (i = m_max - 2; i > 0; i--) {
      // Check whether the currently biggest is equal to the last one
      // in the unique range
      if (Compare(0, last) != 0) {
        last--;
        m_Array[last] = m_Array[0];
        m_nUnique++;
      }
      Swap(0, i);
      EnforceHeapProperty(0, i);
    }
    if (Compare(0, last) != 0) {
      last--;
      m_Array[last] = m_Array[0];
      m_nUnique++;
    }
  }

  void Randomize()
  {
    int i;

    for (i = 0; i < m_max; i++) {
      m_Array[i] = (rand() * 10) / RAND_MAX;
    }
  }

  int m_nUnique;
  int *m_Array;
  int m_max;
  int m_nSwaps;
  int m_nCompares;
};

**Yves M** · November 19th, 2002, 12:51 PM

I just realized that STL comes with heap functions. So here is a simpler program:

Code:

#include <algorithm>

using namespace std;

int main(void)
{
  int array&#091;10&#093; = {6,5,4,9,2,6,5,4,8,10};
  int i, last;

  make_heap(array, array + 10);
  last = 9;
  for (i = 10; i > 0; i--) {
    pop_heap(array, array + i);
    if (array&#091;i - 1&#093; != array&#091;last&#093;) {
      last--;
      array&#091;last&#093; = array&#091;i - 1&#093;;
    }
  }
  for (i = last; i < 10; i++) {
    printf("%4d", array&#091;i&#093;);
  }
  return 0;
}

**Kdr Kane** · November 20th, 2002, 12:10 AM

Do you think this example applies to the question?

Please mind that I'm just a beginner and this took me some time to do.

It will remove any duplicate data that is exclusive of zero and -1. You should be able to use int in place of char.

The first pointer (a) advances through the list twice. Once to find duplicates. The second time to remark the NULL terminator. The second pointer (b) first changes duplicates to -1. Then pointer (b) will fill in the "holes" and move legal data forward.

Code:

// NoDups.cpp
#include <iostream>
using namespace std;

int main()
{
    char str[] = "The quick brown Thqukwn 0123456789... 6543";
    char *a;
    char *b;

    a = str;

    cout << str << endl;
    while(*a)
    {
        if(*a == -1)
        {
            b = a;
            b++;
            while(*b)
            {
                if(*b == -1) 
                    b++;
                else 
                {
                    *a = *b;
                    break;
                }
            }
        }
        b = a;
        b++;
        while(*b)
        {
            if(*b == *a)
                *b = -1;
            b++;
        }
        a++;
    }
    
    a = str;

    while((*a != -1) && (*a != 0))
        a++;
    *a = 0; 
    cout << str << endl;
    return 0;
}

Thread: Remove Duplicate Characters from a String without creating a new one ?

Thread Tools

Display

Remove Duplicate Characters from a String without creating a new one ?

????

re: Yves

Absolutely

Posting Permissions