is there a good algorithm for finding deleted items in an ordered list?
The ids are autoincremented such that a deleted id will never be used (un-deleted) again.
Example:
t=1
source list: 1,2,3,4,5,6,7,8,9
target list: 1,2,3,4,5,6,7,8,9
first sync: source is equal target
t=2
source list: 1,2,3,5,6,8,9 (4 and 7 were deleted)
target list: 1,2,3,4,5,6,7,8,9
challenge: find the deleted items in the source and delete them in the target
Current algorithm is very simple but slow:
1. save the start time of the iteration
2. iterate through all items of the source list and update the last check timestamp in the target
3. consider all items with last check timestamp older than the start time from 1 as deleted
4. double check that the items do not exist and delete them in the target
In the real application the length of the list is round about 10^6 items. Iterating through all items takes some time and produces much traffic.
Is there an algorithm that could do it faster?
I am thinking about some recursion with divide and conquer approach...
Pretty much you step through the lists. On each iteration, if the value you're looking at in the source list (src[srcIndex]) is bigger than the value you're looking at in the target list tgt[tgtIndex], you know you've already "passed" where it should be in the source list, and you can delete it from target.
On the other hand if the source list value is smaller, just increment the value you're looking at until you either (a) exactly match the target value or (b) pass the target value. On the next iteration: if (a): don't delete, because we found the target value in the source list; if (b) do delete, because we passed it.
I think there is no logic bug there. You might just try to implement it and check. Nuzzle, look OK to you?
Best Regards,
BioPhysEngr http://blog.biophysengr.net
--
All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.
Bookmarks