keeping the top N outcomes (best way to merge vectors, I think)
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 5 of 5

Thread: keeping the top N outcomes (best way to merge vectors, I think)

  1. #1
    Join Date
    May 2009
    Location
    Boston
    Posts
    320

    keeping the top N outcomes (best way to merge vectors, I think)

    Hello, sorry for having multiple thread open at the same time but this is different enough that I thought it should be on it's own.

    I have some incremental results. In practice these end up as a vector of float. As the code progresses, I am keeping the top N largest values overall.

    For example where N=3, after the first iteration, my top 3 vector contains,
    Code:
    5.5
    4.4
    2.2
    The next set begin evaluated to see if it has any element that qualify for the top 3 is,
    Code:
    1.1
    4.7
    0.1
    Since 4.7 is larger than any element on my current top 3, I would replace 2.2 with 4.7 to end up with a new top 3 list of,
    Code:
    5.5
    4.7
    4.4
    My top_N_set is already sorted large to small. I have the largest value in the new set in scope, so what I am doing is first asking if the largest new value is larger than the smallest value in the current top_N_set. If it isn't, there is nothing to do. If it is, I combine the vectors, resort, and resize.

    Code:
    unsigned int N = 3;
    float largest_new_value;
    vector<float> top_N_set, new_set;
    
    // the top_N_set is already sorted large to small
    sort( top_N_set.begin(), top_N_set.end(), greater<float>() );
    
    // check to see if the largest new value is > the smallest value in top_N_set
    if( largest new value > top_N_set.back() ) {
    
       // combine top_N_set[] with new_set[]
    
       // preallocate memory for combined vector
       top_N_set.reserve( top_N_set.size() + new_set.size() );
    
       // insert new_set at end of top_IHB_product_sums
       top_N_set.insert( top_N_set.end(), new_set.begin(), new_set.end() );
    
       // resort top_N_set large to small 
       sort( top_N_set.begin(), top_N_set.end(), greater<float>() );
    
       // trim top_N_set back to N
       top_N_set.resize(N);
    
    }
    There are allot of ways to combine vectors. I also looked at,
    Code:
    // a more old fashioned way assigning by []
    unsigned int N = 3;
    unsigned int start_position, resize_to;
    
    float largest_new_value;
    vector<float> top_N_set, new_set;
    
    // create new size for top_N_set vector and resize
    start_position = top_N_set.size();
    resize_to = start_position + new_set.size();
    top_N_set.resize(resize_to);
    
    // add all elements of new_setto end of top_N_set
    for(i=0; i<new_set.size(); i++) { top_N_set[start_position+i] = new_set[i]; }
    
    
    //-------------------------------------------------------------------------------------//
    // just using push_back
    // push all elements of new_set onto to end of top_N_set
    for(i=0; i<new_set.size(); i++) { top_N_set.push_back( new_set[i] ); }
    With some testing, it seems as if the reserve() and insert() method gives the best performance.

    The top_N_set vector will never be very large (probably < 30), but the new_set vector will range from 1 to many thousands. I am not sure that there is an approach that will work equally well for all set sizes so I am tying to settle on something that will be reasonable.

    There are other approaches like sorting new_set and then searching for the first element that is not > top_N_set.back(). An iterator set to this position would define the range of new_set that is relevant and that might help in situations where new_set is large and the number of values that might go into top_N_set is small. Every additional step adds processing that is probably not helpful in situations where new_set is small of where most of new_set is > top_N_set.back().

    Are there any suggestions on this as far as how I am combining vectors or on the overall approach? I am using gcc3.3/g++3.3/g77 in both 32 and 64-bit.

    LMHmedchem

  2. #2
    2kaud's Avatar
    2kaud is offline Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    5,835

    Re: keeping the top N outcomes (best way to merge vectors, I think)

    If I understand this, you want the revised top_N_set vector to contain the largest N values of the current top_N_set vector and of the new_set vector? This could require multiple values of the top_N_set vector being replaced from the new_set vector? Also that new_set_vector is not sorted? From how new_set vector is created, I take it that new_set vector can't be created in numerical order?
    Last edited by 2kaud; February 25th, 2017 at 05:17 PM.
    All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!

    C++17 Compiler: Microsoft VS2017 (15.4.4)

  3. #3
    Join Date
    May 2009
    Location
    Boston
    Posts
    320

    Re: keeping the top N outcomes (best way to merge vectors, I think)

    Quote Originally Posted by 2kaud View Post
    If I understand this, you want the revised top_N_set vector to contain the largest N values of the current c vector and of the new_set vector? This could require multiple values of the top_N_set vector being replaced from the new_set vector? Also that new_set_vector is not sorted? From how new_set vector is created, I take it that new_set vector can't be created in numerical order?
    That is correct. All of top_N_set could be replaced or none. As the algorithm progresses through sets, it will be less and less likely that anything will get replaced so that is why I test the size of largest_new_value against top_N_set.back().

    There would be tendency for new_set to be in order from low to high, but that cannot be counted on. This is why my first try was to just combine the vectors and re-sort. For small versions of new_set, it probably doesn't matter that much how this is done. For larger versions of new_set it could be helpful to sort new_set and only work with part of it, but if we are going to sort a large version of new_set with 5000 members, adding the 20-30 values from top_N_set and then sorting probably doesn't cost any more and were done.

    I was thinking that it might make more sense to add top_N_set to the end of new_set since either top_N_set will be smaller, or both will be small. That would mean less copying when new_set is large.

    LMHmedchem

  4. #4
    Join Date
    Oct 2008
    Posts
    1,456

    Re: keeping the top N outcomes (best way to merge vectors, I think)

    Quote Originally Posted by LMHmedchem View Post
    The top_N_set vector will never be very large (probably < 30), but the new_set vector will range from 1 to many thousands[...]As the algorithm progresses through sets, it will be less and less likely that anything will get replaced so that is why I test the size of largest_new_value against top_N_set.back().
    under these assumptions, an optimization could be to remove ( via std::remove_if ) all elements in new_set smaller than top_N_set.back() before merging new_set into top_N_set. Alternatively, if you need to keep new_set unaltered element wise, take a look at std:: partition instead, or pushback filtered elements directly ... also, take a look at std:: partial_sort for the final sorting ...
    Last edited by superbonzo; February 26th, 2017 at 04:45 AM.

  5. #5
    2kaud's Avatar
    2kaud is offline Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    5,835

    Re: keeping the top N outcomes (best way to merge vectors, I think)

    Consider also another potential way of doing this. rather than have top_N_set as a vector, have it as a set. This means that it is always ordered properly and doesn't need to be sorted. This way, only one pass of new_set is needed and if the value is greater than the last (smallest) value of top_N_set and the value doesn't already exist then delete the last value and insert the new larger value. Once top_N_set is correctly populated, then if it is required as a vector rather than a set, then a vector can be easily created from it. As top_N_set is small, the overhead of keeping it in sorted order should be quite small and more than compensated by just the one pass over new_set. Consider some test code
    Code:
    typedef set<float, greater<float> > SetF;
    
    //Inserts from newset into topN greatest values
    size_t insert(size_t N, float largest_new_value, SetF& topN, const vector<float>& newset)
    {
    	size_t added = 0;
    	vector<float>::const_iterator vci;
    	SetF::iterator si = topN.end();
    
    	if (topN.size() >= N) {
    		if (largest_new_value > *(--si)) {
    			for (vci = newset.begin(); vci != newset.end(); ++vci) {
    				si = --topN.end();
    				if ((*vci > *si) && (topN.count(*vci) == 0)) {
    					topN.erase(si);
    					topN.insert(*vci);
    					++added;
    				}
    			}
    		}
    	} else {
    		for (vci = newset.begin(); vci != newset.end(); ++vci) {
    			if (topN.size() >= N) {
    				si = --topN.end();
    				if ((*vci > *si) && (topN.count(*vci) == 0)) {
    					topN.erase(si);
    					topN.insert(*vci);
    					++added;
    				}
    			} else {
    				topN.insert(*vci);
    				++added;
    			}
    		}
    	}
    	return added;
    }
    
    //For test, set vector to random values
    float vinit(size_t n, vector<float>& vf)
    {
    	float max = 0.0f;
    
    	for (size_t t = 0; t < n; ++t) {
    		vf[t] = rand() / 100.0f;
    		if (vf[t] > max) {
    			max = vf[t];
    		}
    	}
    
    	return max;
    }
    
    //Display container
    template<typename T>
    void display(const T& sf)
    {
    	T::const_iterator ci;
    
    	cout << endl;
    	for (ci = sf.begin(); ci != sf.end(); ++ci) {
    		cout << *ci << endl;
    	}
    
    	cout << endl;
    }
    
    int main()
    {
    	const size_t N = 30;
    	const size_t Nset = 10000;
    	const size_t iter = 10;
    
    	srand((unsigned int)time(NULL));	//Only for test vinit()
    
    	SetF top_N_set;
    	vector<float> new_set(Nset);
    
    	for (int i = 0; i < iter; ++i) {
    		float largest_new_value = vinit(Nset, new_set);
    		cout << "Iter: " << i + 1<< " Added: " << insert(N, largest_new_value, top_N_set, new_set) << endl;
    	}
    
    	display(top_N_set);
    
    	cout << "max: " << top_N_set.size() << endl;
    
    	//If need top N as a vector rather than a set create vector from set
    	vector<float> topN(top_N_set.begin(), top_N_set.end());
    
    	//display(topN);
    }
    Note that set doesn't have back(), hence the decrement trick with the iterator.
    Last edited by 2kaud; February 26th, 2017 at 08:36 AM.
    All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!

    C++17 Compiler: Microsoft VS2017 (15.4.4)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This a Codeguru.com survey!


On-Demand Webinars (sponsored)