keeping the top N outcomes (best way to merge vectors, I think)
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 5 of 5

Thread: keeping the top N outcomes (best way to merge vectors, I think)

  1. #1
    Join Date
    May 2009
    Location
    Boston
    Posts
    312

    keeping the top N outcomes (best way to merge vectors, I think)

    Hello, sorry for having multiple thread open at the same time but this is different enough that I thought it should be on it's own.

    I have some incremental results. In practice these end up as a vector of float. As the code progresses, I am keeping the top N largest values overall.

    For example where N=3, after the first iteration, my top 3 vector contains,
    Code:
    5.5
    4.4
    2.2
    The next set begin evaluated to see if it has any element that qualify for the top 3 is,
    Code:
    1.1
    4.7
    0.1
    Since 4.7 is larger than any element on my current top 3, I would replace 2.2 with 4.7 to end up with a new top 3 list of,
    Code:
    5.5
    4.7
    4.4
    My top_N_set is already sorted large to small. I have the largest value in the new set in scope, so what I am doing is first asking if the largest new value is larger than the smallest value in the current top_N_set. If it isn't, there is nothing to do. If it is, I combine the vectors, resort, and resize.

    Code:
    unsigned int N = 3;
    float largest_new_value;
    vector<float> top_N_set, new_set;
    
    // the top_N_set is already sorted large to small
    sort( top_N_set.begin(), top_N_set.end(), greater<float>() );
    
    // check to see if the largest new value is > the smallest value in top_N_set
    if( largest new value > top_N_set.back() ) {
    
       // combine top_N_set[] with new_set[]
    
       // preallocate memory for combined vector
       top_N_set.reserve( top_N_set.size() + new_set.size() );
    
       // insert new_set at end of top_IHB_product_sums
       top_N_set.insert( top_N_set.end(), new_set.begin(), new_set.end() );
    
       // resort top_N_set large to small 
       sort( top_N_set.begin(), top_N_set.end(), greater<float>() );
    
       // trim top_N_set back to N
       top_N_set.resize(N);
    
    }
    There are allot of ways to combine vectors. I also looked at,
    Code:
    // a more old fashioned way assigning by []
    unsigned int N = 3;
    unsigned int start_position, resize_to;
    
    float largest_new_value;
    vector<float> top_N_set, new_set;
    
    // create new size for top_N_set vector and resize
    start_position = top_N_set.size();
    resize_to = start_position + new_set.size();
    top_N_set.resize(resize_to);
    
    // add all elements of new_setto end of top_N_set
    for(i=0; i<new_set.size(); i++) { top_N_set[start_position+i] = new_set[i]; }
    
    
    //-------------------------------------------------------------------------------------//
    // just using push_back
    // push all elements of new_set onto to end of top_N_set
    for(i=0; i<new_set.size(); i++) { top_N_set.push_back( new_set[i] ); }
    With some testing, it seems as if the reserve() and insert() method gives the best performance.

    The top_N_set vector will never be very large (probably < 30), but the new_set vector will range from 1 to many thousands. I am not sure that there is an approach that will work equally well for all set sizes so I am tying to settle on something that will be reasonable.

    There are other approaches like sorting new_set and then searching for the first element that is not > top_N_set.back(). An iterator set to this position would define the range of new_set that is relevant and that might help in situations where new_set is large and the number of values that might go into top_N_set is small. Every additional step adds processing that is probably not helpful in situations where new_set is small of where most of new_set is > top_N_set.back().

    Are there any suggestions on this as far as how I am combining vectors or on the overall approach? I am using gcc3.3/g++3.3/g77 in both 32 and 64-bit.

    LMHmedchem

  2. #2
    2kaud's Avatar
    2kaud is offline Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    5,587

    Re: keeping the top N outcomes (best way to merge vectors, I think)

    If I understand this, you want the revised top_N_set vector to contain the largest N values of the current top_N_set vector and of the new_set vector? This could require multiple values of the top_N_set vector being replaced from the new_set vector? Also that new_set_vector is not sorted? From how new_set vector is created, I take it that new_set vector can't be created in numerical order?
    Last edited by 2kaud; February 25th, 2017 at 04:17 PM.
    All advice is offered in good faith only. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/

    C, C++ Compiler: Microsoft VS2017.2

  3. #3
    Join Date
    May 2009
    Location
    Boston
    Posts
    312

    Re: keeping the top N outcomes (best way to merge vectors, I think)

    Quote Originally Posted by 2kaud View Post
    If I understand this, you want the revised top_N_set vector to contain the largest N values of the current c vector and of the new_set vector? This could require multiple values of the top_N_set vector being replaced from the new_set vector? Also that new_set_vector is not sorted? From how new_set vector is created, I take it that new_set vector can't be created in numerical order?
    That is correct. All of top_N_set could be replaced or none. As the algorithm progresses through sets, it will be less and less likely that anything will get replaced so that is why I test the size of largest_new_value against top_N_set.back().

    There would be tendency for new_set to be in order from low to high, but that cannot be counted on. This is why my first try was to just combine the vectors and re-sort. For small versions of new_set, it probably doesn't matter that much how this is done. For larger versions of new_set it could be helpful to sort new_set and only work with part of it, but if we are going to sort a large version of new_set with 5000 members, adding the 20-30 values from top_N_set and then sorting probably doesn't cost any more and were done.

    I was thinking that it might make more sense to add top_N_set to the end of new_set since either top_N_set will be smaller, or both will be small. That would mean less copying when new_set is large.

    LMHmedchem

  4. #4
    Join Date
    Oct 2008
    Posts
    1,435

    Re: keeping the top N outcomes (best way to merge vectors, I think)

    Quote Originally Posted by LMHmedchem View Post
    The top_N_set vector will never be very large (probably < 30), but the new_set vector will range from 1 to many thousands[...]As the algorithm progresses through sets, it will be less and less likely that anything will get replaced so that is why I test the size of largest_new_value against top_N_set.back().
    under these assumptions, an optimization could be to remove ( via std::remove_if ) all elements in new_set smaller than top_N_set.back() before merging new_set into top_N_set. Alternatively, if you need to keep new_set unaltered element wise, take a look at std:: partition instead, or pushback filtered elements directly ... also, take a look at std:: partial_sort for the final sorting ...
    Last edited by superbonzo; February 26th, 2017 at 03:45 AM.

  5. #5
    2kaud's Avatar
    2kaud is offline Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    5,587

    Re: keeping the top N outcomes (best way to merge vectors, I think)

    Consider also another potential way of doing this. rather than have top_N_set as a vector, have it as a set. This means that it is always ordered properly and doesn't need to be sorted. This way, only one pass of new_set is needed and if the value is greater than the last (smallest) value of top_N_set and the value doesn't already exist then delete the last value and insert the new larger value. Once top_N_set is correctly populated, then if it is required as a vector rather than a set, then a vector can be easily created from it. As top_N_set is small, the overhead of keeping it in sorted order should be quite small and more than compensated by just the one pass over new_set. Consider some test code
    Code:
    typedef set<float, greater<float> > SetF;
    
    //Inserts from newset into topN greatest values
    size_t insert(size_t N, float largest_new_value, SetF& topN, const vector<float>& newset)
    {
    	size_t added = 0;
    	vector<float>::const_iterator vci;
    	SetF::iterator si = topN.end();
    
    	if (topN.size() >= N) {
    		if (largest_new_value > *(--si)) {
    			for (vci = newset.begin(); vci != newset.end(); ++vci) {
    				si = --topN.end();
    				if ((*vci > *si) && (topN.count(*vci) == 0)) {
    					topN.erase(si);
    					topN.insert(*vci);
    					++added;
    				}
    			}
    		}
    	} else {
    		for (vci = newset.begin(); vci != newset.end(); ++vci) {
    			if (topN.size() >= N) {
    				si = --topN.end();
    				if ((*vci > *si) && (topN.count(*vci) == 0)) {
    					topN.erase(si);
    					topN.insert(*vci);
    					++added;
    				}
    			} else {
    				topN.insert(*vci);
    				++added;
    			}
    		}
    	}
    	return added;
    }
    
    //For test, set vector to random values
    float vinit(size_t n, vector<float>& vf)
    {
    	float max = 0.0f;
    
    	for (size_t t = 0; t < n; ++t) {
    		vf[t] = rand() / 100.0f;
    		if (vf[t] > max) {
    			max = vf[t];
    		}
    	}
    
    	return max;
    }
    
    //Display container
    template<typename T>
    void display(const T& sf)
    {
    	T::const_iterator ci;
    
    	cout << endl;
    	for (ci = sf.begin(); ci != sf.end(); ++ci) {
    		cout << *ci << endl;
    	}
    
    	cout << endl;
    }
    
    int main()
    {
    	const size_t N = 30;
    	const size_t Nset = 10000;
    	const size_t iter = 10;
    
    	srand((unsigned int)time(NULL));	//Only for test vinit()
    
    	SetF top_N_set;
    	vector<float> new_set(Nset);
    
    	for (int i = 0; i < iter; ++i) {
    		float largest_new_value = vinit(Nset, new_set);
    		cout << "Iter: " << i + 1<< " Added: " << insert(N, largest_new_value, top_N_set, new_set) << endl;
    	}
    
    	display(top_N_set);
    
    	cout << "max: " << top_N_set.size() << endl;
    
    	//If need top N as a vector rather than a set create vector from set
    	vector<float> topN(top_N_set.begin(), top_N_set.end());
    
    	//display(topN);
    }
    Note that set doesn't have back(), hence the decrement trick with the iterator.
    Last edited by 2kaud; February 26th, 2017 at 07:36 AM.
    All advice is offered in good faith only. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/

    C, C++ Compiler: Microsoft VS2017.2

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This a Codeguru.com survey!


On-Demand Webinars (sponsored)