# Finding deleted items in an ordered list?

• January 24th, 2013, 01:01 AM
ottom
Finding deleted items in an ordered list?
Hello,

is there a good algorithm for finding deleted items in an ordered list?
The ids are autoincremented such that a deleted id will never be used (un-deleted) again.

Example:

t=1
source list: 1,2,3,4,5,6,7,8,9
target list: 1,2,3,4,5,6,7,8,9
first sync: source is equal target

t=2
source list: 1,2,3,5,6,8,9 (4 and 7 were deleted)
target list: 1,2,3,4,5,6,7,8,9
challenge: find the deleted items in the source and delete them in the target

Current algorithm is very simple but slow:

1. save the start time of the iteration
2. iterate through all items of the source list and update the last check timestamp in the target
3. consider all items with last check timestamp older than the start time from 1 as deleted
4. double check that the items do not exist and delete them in the target

In the real application the length of the list is round about 10^6 items. Iterating through all items takes some time and produces much traffic.

Is there an algorithm that could do it faster?
I am thinking about some recursion with divide and conquer approach...
• January 24th, 2013, 01:37 AM
BioPhysEngr
Re: Finding deleted items in an ordered list?
Nothing seems fundamentally wrong with your algorithm. Execution time is O(n). I don't think you can improve that by divide and conquer.

I think the timestamping is a little unusual though. Instead, I think something like this will work:

Supposing two lists of numbers (src and tgt) and two indexes into those lists (srcIndex, tgtIndex):

Code:

```srcIndex = 0; tgtIndex = 0; loop:     if target[tgtIndex] < src[srcIndex]:         delete target[tgtIndex]         tgtIndex = tgtIndex + 1     else if srcIndex[srcIndex] < tgt[tgtIndex]:         srcIndex++     else: //Equal         tgtIndex++         srcIndex++```
Pretty much you step through the lists. On each iteration, if the value you're looking at in the source list (src[srcIndex]) is bigger than the value you're looking at in the target list tgt[tgtIndex], you know you've already "passed" where it should be in the source list, and you can delete it from target.

On the other hand if the source list value is smaller, just increment the value you're looking at until you either (a) exactly match the target value or (b) pass the target value. On the next iteration: if (a): don't delete, because we found the target value in the source list; if (b) do delete, because we passed it.

I think there is no logic bug there. You might just try to implement it and check. Nuzzle, look OK to you?