keeping the top N outcomes (best way to merge vectors, I think)
 CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com

# Thread: keeping the top N outcomes (best way to merge vectors, I think)

1. ## keeping the top N outcomes (best way to merge vectors, I think)

Hello, sorry for having multiple thread open at the same time but this is different enough that I thought it should be on it's own.

I have some incremental results. In practice these end up as a vector of float. As the code progresses, I am keeping the top N largest values overall.

For example where N=3, after the first iteration, my top 3 vector contains,
Code:
```5.5
4.4
2.2```
The next set begin evaluated to see if it has any element that qualify for the top 3 is,
Code:
```1.1
4.7
0.1```
Since 4.7 is larger than any element on my current top 3, I would replace 2.2 with 4.7 to end up with a new top 3 list of,
Code:
```5.5
4.7
4.4```
My top_N_set is already sorted large to small. I have the largest value in the new set in scope, so what I am doing is first asking if the largest new value is larger than the smallest value in the current top_N_set. If it isn't, there is nothing to do. If it is, I combine the vectors, resort, and resize.

Code:
```unsigned int N = 3;
float largest_new_value;
vector<float> top_N_set, new_set;

// the top_N_set is already sorted large to small
sort( top_N_set.begin(), top_N_set.end(), greater<float>() );

// check to see if the largest new value is > the smallest value in top_N_set
if( largest new value > top_N_set.back() ) {

// combine top_N_set[] with new_set[]

// preallocate memory for combined vector
top_N_set.reserve( top_N_set.size() + new_set.size() );

// insert new_set at end of top_IHB_product_sums
top_N_set.insert( top_N_set.end(), new_set.begin(), new_set.end() );

// resort top_N_set large to small
sort( top_N_set.begin(), top_N_set.end(), greater<float>() );

// trim top_N_set back to N
top_N_set.resize(N);

}```
There are allot of ways to combine vectors. I also looked at,
Code:
```// a more old fashioned way assigning by []
unsigned int N = 3;
unsigned int start_position, resize_to;

float largest_new_value;
vector<float> top_N_set, new_set;

// create new size for top_N_set vector and resize
start_position = top_N_set.size();
resize_to = start_position + new_set.size();
top_N_set.resize(resize_to);

// add all elements of new_setto end of top_N_set
for(i=0; i<new_set.size(); i++) { top_N_set[start_position+i] = new_set[i]; }

//-------------------------------------------------------------------------------------//
// just using push_back
// push all elements of new_set onto to end of top_N_set
for(i=0; i<new_set.size(); i++) { top_N_set.push_back( new_set[i] ); }```
With some testing, it seems as if the reserve() and insert() method gives the best performance.

The top_N_set vector will never be very large (probably < 30), but the new_set vector will range from 1 to many thousands. I am not sure that there is an approach that will work equally well for all set sizes so I am tying to settle on something that will be reasonable.

There are other approaches like sorting new_set and then searching for the first element that is not > top_N_set.back(). An iterator set to this position would define the range of new_set that is relevant and that might help in situations where new_set is large and the number of values that might go into top_N_set is small. Every additional step adds processing that is probably not helpful in situations where new_set is small of where most of new_set is > top_N_set.back().

Are there any suggestions on this as far as how I am combining vectors or on the overall approach? I am using gcc3.3/g++3.3/g77 in both 32 and 64-bit.

LMHmedchem

2. ## Re: keeping the top N outcomes (best way to merge vectors, I think)

If I understand this, you want the revised top_N_set vector to contain the largest N values of the current top_N_set vector and of the new_set vector? This could require multiple values of the top_N_set vector being replaced from the new_set vector? Also that new_set_vector is not sorted? From how new_set vector is created, I take it that new_set vector can't be created in numerical order?
Last edited by 2kaud; February 25th, 2017 at 04:17 PM.

3. ## Re: keeping the top N outcomes (best way to merge vectors, I think)

Originally Posted by 2kaud
If I understand this, you want the revised top_N_set vector to contain the largest N values of the current c vector and of the new_set vector? This could require multiple values of the top_N_set vector being replaced from the new_set vector? Also that new_set_vector is not sorted? From how new_set vector is created, I take it that new_set vector can't be created in numerical order?
That is correct. All of top_N_set could be replaced or none. As the algorithm progresses through sets, it will be less and less likely that anything will get replaced so that is why I test the size of largest_new_value against top_N_set.back().

There would be tendency for new_set to be in order from low to high, but that cannot be counted on. This is why my first try was to just combine the vectors and re-sort. For small versions of new_set, it probably doesn't matter that much how this is done. For larger versions of new_set it could be helpful to sort new_set and only work with part of it, but if we are going to sort a large version of new_set with 5000 members, adding the 20-30 values from top_N_set and then sorting probably doesn't cost any more and were done.

I was thinking that it might make more sense to add top_N_set to the end of new_set since either top_N_set will be smaller, or both will be small. That would mean less copying when new_set is large.

LMHmedchem

4. Senior Member
Join Date
Oct 2008
Posts
1,456

## Re: keeping the top N outcomes (best way to merge vectors, I think)

Originally Posted by LMHmedchem
The top_N_set vector will never be very large (probably < 30), but the new_set vector will range from 1 to many thousands[...]As the algorithm progresses through sets, it will be less and less likely that anything will get replaced so that is why I test the size of largest_new_value against top_N_set.back().
under these assumptions, an optimization could be to remove ( via std::remove_if ) all elements in new_set smaller than top_N_set.back() before merging new_set into top_N_set. Alternatively, if you need to keep new_set unaltered element wise, take a look at std:: partition instead, or pushback filtered elements directly ... also, take a look at std:: partial_sort for the final sorting ...
Last edited by superbonzo; February 26th, 2017 at 03:45 AM.

5. ## Re: keeping the top N outcomes (best way to merge vectors, I think)

Consider also another potential way of doing this. rather than have top_N_set as a vector, have it as a set. This means that it is always ordered properly and doesn't need to be sorted. This way, only one pass of new_set is needed and if the value is greater than the last (smallest) value of top_N_set and the value doesn't already exist then delete the last value and insert the new larger value. Once top_N_set is correctly populated, then if it is required as a vector rather than a set, then a vector can be easily created from it. As top_N_set is small, the overhead of keeping it in sorted order should be quite small and more than compensated by just the one pass over new_set. Consider some test code
Code:
```typedef set<float, greater<float> > SetF;

//Inserts from newset into topN greatest values
size_t insert(size_t N, float largest_new_value, SetF& topN, const vector<float>& newset)
{
vector<float>::const_iterator vci;
SetF::iterator si = topN.end();

if (topN.size() >= N) {
if (largest_new_value > *(--si)) {
for (vci = newset.begin(); vci != newset.end(); ++vci) {
si = --topN.end();
if ((*vci > *si) && (topN.count(*vci) == 0)) {
topN.erase(si);
topN.insert(*vci);
}
}
}
} else {
for (vci = newset.begin(); vci != newset.end(); ++vci) {
if (topN.size() >= N) {
si = --topN.end();
if ((*vci > *si) && (topN.count(*vci) == 0)) {
topN.erase(si);
topN.insert(*vci);
}
} else {
topN.insert(*vci);
}
}
}
}

//For test, set vector to random values
float vinit(size_t n, vector<float>& vf)
{
float max = 0.0f;

for (size_t t = 0; t < n; ++t) {
vf[t] = rand() / 100.0f;
if (vf[t] > max) {
max = vf[t];
}
}

return max;
}

//Display container
template<typename T>
void display(const T& sf)
{
T::const_iterator ci;

cout << endl;
for (ci = sf.begin(); ci != sf.end(); ++ci) {
cout << *ci << endl;
}

cout << endl;
}

int main()
{
const size_t N = 30;
const size_t Nset = 10000;
const size_t iter = 10;

srand((unsigned int)time(NULL));	//Only for test vinit()

SetF top_N_set;
vector<float> new_set(Nset);

for (int i = 0; i < iter; ++i) {
float largest_new_value = vinit(Nset, new_set);
cout << "Iter: " << i + 1<< " Added: " << insert(N, largest_new_value, top_N_set, new_set) << endl;
}

display(top_N_set);

cout << "max: " << top_N_set.size() << endl;

//If need top N as a vector rather than a set create vector from set
vector<float> topN(top_N_set.begin(), top_N_set.end());

//display(topN);
}```
Note that set doesn't have back(), hence the decrement trick with the iterator.
Last edited by 2kaud; February 26th, 2017 at 07:36 AM.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•