Click to See Complete Forum and Search --> : grouping subsets to minimize variance


lorenz123
June 29th, 2009, 08:30 PM
Hi,

Hopefully someone can help me with this problem... it's driving me crazy:

I have n sets of values, each containing some arbitrary number of integers.
I need to create a new set of groups, by adding elements within each original set, so that the variance of the new groups is minimized.

Example
S_1 = {1, 2}
S_2 = {3}
S_3 = {1, 3, 4}
Variance here is calculated over {1, 2, 3, 1, 3, 4}

New groups that minimize variance:
G_1 = {3}
G_2 = {3}
G_3 = {4, 4}
Variance here is calculated over {3, 3, 4, 4}


Any idea how to approach this problem? Note that n can be quite large, and the number of elements within each set can be quite large as well.

Thank you!


Lz

prime1999
June 30th, 2009, 12:53 AM
Is this supposed to be solvable in polynomial time?

lorenz123
June 30th, 2009, 01:02 AM
Not sure. I've been trying to find a polynomial solution, but failed so far.

If you can think of a good approximation method, that can be useful as well.

prime1999
June 30th, 2009, 01:05 AM
There are only M-N+1 possible averages and their corresponding minimums (I think it is not monotonic -- so cannot binary search)... so possibly solve the problem in one set given an average...? The latter I still do not have a polynomial time solution... would need to think a bit more.

lorenz123
June 30th, 2009, 01:15 AM
I was thinking of a similar approach -- and I believe I could try all possible averages.

The problem, as you point out, is finding the solution within each set in a reasonable amount of time. For a set of 50 elements, for example, trying all possible partitions yields 1.8572e+047 subsets... (and I don't have that much time!)

prime1999
June 30th, 2009, 01:15 AM
Oops... forgot I also brought in the condition that the final number of numbers is determined... dynamic programming? Then we would have to solve the case where the number of numbers in each set is determined...

lorenz123
June 30th, 2009, 01:25 AM
Dynamic programming would be great, but i am not sure how to do it for this case.

This sounds like a multiple knapsack problem... except that the objective becomes filling the knapsacks to get as close as possible to the capacity...

prime1999
June 30th, 2009, 01:33 AM
I expanded the equation for one set (given a fixed average A and the final number of numbers in the set)... and it seems that we only need to minimize the following:
(given that the numbers are divided like so | a_1+a_2+a_3 | a_4+a_5 | ... (an example))
sum_pairs(first partition)+sum_pairs(second partition)+...
( a_1 a_2 + a_2 a_3 + a_3 a_1 ) + ( a_4 a_5 ) + ...

... which seems a bit complicated (maybe it's the wrong track).

prime1999
June 30th, 2009, 01:35 AM
it seems independent on the average!?

prime1999
June 30th, 2009, 01:36 AM
Sorry: *independent of
Are we allowed to edit our posts??

lorenz123
June 30th, 2009, 01:50 AM
what do you mean by 'independent of the average'? i am not following what you are saying...

prime1999
June 30th, 2009, 01:59 AM
That means whatever the average is, as long as the n numbers in a set are going to become n' numbers, an optimal way is to partition it in a certain fixed way. (Unless I did my calculations wrongly?).

lorenz123
June 30th, 2009, 02:42 AM
you don't know n' in advance...