
November 29th, 2013, 08:07 AM
#1
A better algorithm
Hi,
Sorry if this question has been asked before but I need some help on this as I am stuck on this for more than one year. The problem is as follows 
I have three arrays A,B,C of same size let say n and I need to find out all possible (i,j,k) such that A[i]+B[j]=C[k]. And we
need to find this in O(n^2) complexity for which I found one solution like
Step 1:
Add all A[i]+B[j] as the key and (i,j) as values in a hash table.This gives n^2 no of keys and values.
Step 2:
Traverse for all C[k] and probe the hash table.If mach found (i,j,k) is one pair and so on.
This looks like an O(n^2) solution.But can it be done better or at least O(n^2) in some different approach ?
Regards,
Arka

November 30th, 2013, 02:04 AM
#2
Re: A better algorithm
Originally Posted by arka.sharma
This looks like an O(n^2) solution.
Say you instead insert the C values into the hash table. Then you generate the A+B sums in a pair of nested loops and check which sums are in the table. That's also O(n^2) but with a much smaller hash table.
And it has the additional advantage that you can now eat away on the O(n^2) complexity somewhat by executing the outermost loop in parallel. With n cores the complexity would reduce down to O(n).
Maybe sorting of the arrays could lower complexity but I'm not certain. At least it's a lead.
Last edited by razzle; November 30th, 2013 at 05:45 PM.

December 1st, 2013, 11:41 PM
#3
Re: A better algorithm
Thanks for your reply and a nice suggestion to insert C values in hash table as this will require smaller hash table. But by better I mean less than O(n^2) let say O(n*logn) as your solution is also O(n^2)
Regards,
Arka

December 4th, 2013, 07:13 AM
#4
Re: A better algorithm
If you are allowed to sort A, B and C then yes, you can maybe find this in fewer steps by eliminating impossible values in A, B and C.
Technically possible without sorting as well but the elimination phase will be more complex.
Additionally, by having tables sorted, you can early out on each possible step.
This'll be more complex, so it will only pay off for sufficiently large values of n.

December 5th, 2013, 09:32 PM
#5
Re: A better algorithm
Originally Posted by arka.sharma
as your solution is also O(n^2)
I know that and I said so. Still my solution gives a much smaller hash table and is much easier to parallelize, and that's not too bad.
I've given this another thought and considered sorting of the arrays as a way to reduce complexity but unfortunately I couldn't come up with anything. Sorting may improve speed but how much will be data dependent. It will dependend on the values stored in A, B and C so it's not a complexity reduction.
In principle sorting can accomplish that all A+B values that lie outside the range of the C values can be discarded in advance. This takes O(n * log n) to set up. But if all A+B values fall within the C range nothing is gained.
So the algorithm still is O(n*n) and to me it seems very much like that's the best that can be accomplished. I'll have to leave it at that.
Last edited by razzle; December 5th, 2013 at 09:36 PM.

December 9th, 2013, 06:34 AM
#6
Re: A better algorithm
One more advantage of inserting C[k] in hash table is also there will be no collision as none of the array contains duplicate on the other hand a[i] + b[j] for different (i,j) pair might produce same value and result into collision while inserting in hash table.So nice improvement. Thanks razzle.

December 9th, 2013, 07:32 AM
#7
Re: A better algorithm
Last edited by razzle; December 9th, 2013 at 07:45 AM.

December 9th, 2013, 09:46 AM
#8
Re: A better algorithm
regarding the optimal algorithm complexity, clearly you can always find a worst case input where the number of (i,j,k) such as ck=ai+bj scales as n^2, so, their mere enumeration implies an algorithm no better then O(n^2); regarding the avarage case instead, you should specify better the probability distribution of the input in order to give an answer.
anyway, assuming A and B sorted, non negative and without duplicates, an algorithm that doesn't require a hash table consists in traversing, for every k and starting from the biggest A[i]<=C[k], the strip of A[i]+B[j] closest to C[k]; this can be done in O(N) hence giving an overall complexity again of O(N^2).
BTW, if you're just interested in knowing, for every k, how many the (i,j) such as ck=ai+bj are, and you're willing to trade space for speed, then this can be computed via a convolution that, in turn, can be computed in O(nlogn) with the help of the FFT. This information could also be used to speed up the full O(N^2) algorithm when N becomes very big ...

December 9th, 2013, 04:42 PM
#9
Re: A better algorithm
Originally Posted by superbonzo
regarding the avarage case instead, you should specify better the probability distribution of the input in order to give an answer.
That's not how you establish the average complexity of an algorithm. You do that by assuming average input.

December 10th, 2013, 02:03 AM
#10
Re: A better algorithm
Originally Posted by razzle
That's not how you establish the average complexity of an algorithm. You do that by assuming average input.
... and how do you define the "avarage input" ? whether it's done explicitly or not, it always boils down to the choice of a family of prior probability distributions.

December 10th, 2013, 08:11 AM
#11
Re: A better algorithm
You can reduce O(n²) to O((nm)²) (with m being [0..n]) with a setup stage. With a large value of n, even a small value of m can make a huge difference in runtime.
the problem can't be further complexity reduced, but you can increase performance to less than the flat squareroot of n (or nm)
Technically, since you can't guarantee a reduction is even possible the technical difficulty of this problem remains the original O(n²).
Just because a problem is O(anything) doesn't mean it's runtime is necessarily linear to that 'anything'. It gives you a rough guide/estimate to worst case execution time.

December 12th, 2013, 05:35 AM
#12
Re: A better algorithm
Originally Posted by superbonzo
... and how do you define the "avarage input" ? whether it's done explicitly or not, it always boils down to the choice of a family of prior probability distributions.
Your approach will measure the average performance of the algorithm for a certain input. It will not determine the averagecase complexity of the algorithm in general.
Last edited by razzle; December 14th, 2013 at 03:06 AM.

December 12th, 2013, 07:47 AM
#13
Re: A better algorithm
Originally Posted by razzle
You approach will measure the average performance of the algorithm for a certain input. It will not determine the averagecase complexity of the algorithm in general.
No no no no no....
O(n) complexity means that you can calculate/guess a WORST CASE performance for ninput. (this isn't the same as the complexity for a worst case dataset !).
the Ocomplexity typically gives a "average case complexity" (=average datasets) but for many algorithms you get a best case/average case/worst case complexity notation.
Take quicksort.
THis has a O(n²) worst case complexity, and thus, assuming the worst possible datasets:
if you can sort 10 items in 100 (10²) seconds, you can "guestimate" 100 items to sort in no more than 10000 (100²)seconds. it's entirely possible that some internal optimisation or shortcut makes it run considerably faster, but it shouldn't however take significantly more than 10000.
The "average case" scenario for quicksort is O(n log(n))
again, this means that if we have an "average dataset" we can guestimate a worst case runtime for an ninput set.
Optimsations that can "early out"/"shortcut out" of specific data conditions don't/can't influence the complexity formula because they're data dependant, but can make code run in considerably less time than the Ocomplexity would indicate.

December 14th, 2013, 05:32 AM
#14
Re: A better algorithm
Originally Posted by OReubens
No no no no no....
You seem to agree with me in principle so would you please be more specific about what you are opposed to.
In short my point is that there are two kinds of complexity averages. One is in response to one specific dataset (say a certain probability distribution) and the other is in response to any possible dataset (any length, any distribution, any everything). They may coinside but one cannot assume that.
In my view it's the second kind that's the proper averagecase complexity measure. It's because it measures the inherent rather than the relative complexity of an algorithm.

December 14th, 2013, 08:01 AM
#15
Re: A better algorithm
Originally Posted by razzle
One is in response to one specific dataset (say a certain probability distribution) and the other is in response to any possible dataset (any length, any distribution, any everything).
an avarage "with respect to everything" is either nonsense or it can be reduced to the choice of a ( possibly degenerate and/or a family of ) probability space. Yes, one can define probability theory in terms of avarages only ( this can be even very useful, both technically and conceptually, C* algebras used in quantum mechanics being notable examples ) but, conceptually, probabilities and avarages are intrinsically bound concepts ( probabilities are avarages of "counting" observables, avarages are functionals on random variables ( aka, measureable functions between probability spaces ) ). Of course, in order to define an O() asymptotics one needs to tell apart an indipendent variable "n" somewhere with respect to which probabilities are parametrized, but there are many ways of doing such a thing, and certainly none of them qualifies as "the inherent avarage complexity" of an algorithm ( unless we're speaking of a randomized algorithm, but this is a different story (*) ) ...
(*) BTW, speaking of quicksort and randomized algorithms, in some academic level algorithm books, authors even refuse to compute a loosely defined avarage complexity ( stressing its poor usefulness in real world scenarios ), they directly compute the complexity of its randomized version where the input is randomly permuted before the nonrandomized algorithm is invoked, hence allowing to compute an avarage for each input size 'n'. But, this is not the "inherent" avarage complexity, it's just a way of presenting the result with respect to ( a still restricted ) family of input distributions in disguise.
Tags for this Thread
Posting Permissions
 You may not post new threads
 You may not post replies
 You may not post attachments
 You may not edit your posts

Forum Rules

Click Here to Expand Forum to Full Width
