a variation on all possible subsets - Page 3
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 3 of 3 FirstFirst 123
Results 31 to 36 of 36

Thread: a variation on all possible subsets

  1. #31
    Join Date
    May 2009
    Location
    Boston
    Posts
    318

    Re: a variation on all possible subsets

    Quote Originally Posted by 2kaud View Post
    That's great!
    Thank you for the encouragement. I am pleased that my concept of how to do this is not without merit but I am still not convinced that it is the optimal solution.

    The method outlined in my last post is certainly not the most efficient way to creating all possible subsets in lexical order. The potential benefit of the method arises when incompatible pairs are removed. For instance if 1 is incompatible with 0.

    Code:
    // creation of branch level 1 (pairs)
    0,1 // fails because 1 is not compatible with 0
    0,2
    0,3
    0,4
    
    // 4 total checks, 3 extended sets created
    extended_sets[]
    0,2
    0,3
    0,4
    
    last_added_elements[]
    {2,3,4}
    Both extended_sets[] and last_added_elements[] are smaller than when the full set is traversed because the 0,1 pair was tested but not created. For the next branch level.


    Code:
    // creation of branch level 2 (triples)
    // no subsets starting with 0,1 need are examined
    0,2 add 2 // fails 2 is already a member of 0,2
    0,2 add 3
    0,2 add 4
    
    0,3 add 2 // fails 2 is smaller than 3
    0,3 add 3 // fails 3 is already a member of 0,3
    0,3 add 4
    
    0,4 add 2 // fails 2 is smaller than 4
    0,4 add 3 // fails 3 is smaller than 4
    0,4 add 4 // fails 4 is already a member of 0,4
    
    // 9 total checks, 3 extended sets created
    extended_sets[]
    0,2,3
    0,2,4
    0,3,4
    
    last_added_elements[]
    {3,4}
    Code:
    // creation of branch level 3 (quads)
    0,2,3 add 3 // fails 3 is already a member of 0,2,3
    0,2,3 add 4
    
    0,2,4 add 3 // fails 3 is smaller than 4
    0,2,4 add 4 // fails 4 is already a member of 0,2,4
    
    0,3,4 add 3 // fails 3 is smaller than 4
    0,3,4 add 4 // fails 4 is already a member of 0,2,4
    
    // 6 total checks, 1 extended set created
    extended_sets[]
    0,2,3,4
    
    last_added_elements[]
    {4}
    Code:
    // creation of branch level 4 (quints)
    0,2,3,4 add 4 // fails 4 is already a member of 0,2,3,4
    
    // 1 total check, 0 extended sets created
    extended_sets[]
    
    last_added_elements[]
    {}
    

    The algorithm quits here because last_added_elements[] is empty so no larger subsets can be created. The final list of legal substes when 0 and 1 are incompatible is,
    Code:
    0
    0,2
    0,3
    0,4
    0,2,3
    0,2,4
    0,3,4
    0,2,3,4
    

    Only 7 of the theoretical 16 subsets are created. I believe that a brief analysis of the example above shows that problems still remain with this method. For small sets such as this with only one incompatible pair, you can see that 20 checks are required. There are only 16 theoretical pairs, so in this case it would have been faster to create all the pairs and check them. For problems like N=40 with many incompatible pairs, you see that only 3,708,590 of the theoretical 1,099,511,627,776 combinations need to be examined to produce the surviving 178,211 legal subsets. This works much better for very large problems than for small ones.

    There are many checks being made that are probably not necessary. For instance, after the pairs are created, all the checks fail when trying to extend the last set in root_elements[]
    . The pair 0,4 is never extended, the triple 0,3,4 is never extended, etc. Also, the first element in elements_to_add[] is never added to any root_elements[] set. In all cases, it is already in the set or smaller then the last number in the set.

    I thought I should be able to come up with a more clever looping scheme here that would address that. this can be done when all subsets are created because the creation pattern is known in advance. There is a problem when pairs are removed because of incompatibility in that there is no way to know how many elements will be in the elements_to_add[] list for the next iteration.

    This algorithm could be additionally enhanced if the number of checks could be reduced but I couldn't see how to do that right away so I just wrote the code in a way I knew would work. I have attached the code which was just built with,
    Code:
    c++ -c  -Wall -O2 APS_3.cpp; c++ -o APS_3.exe APS_3.o
    The code is set up to process the N=40 problem 10 times and on this system, that takes about 3.5 seconds. The output is minimal. A full list of the subsets found can be printed by changing the print bool to true, but that makes the runtime substantially longer.

    I am interested in improving that algorithm, but also at looking at others that have been mentioned that process the same problem. It would still be nice to have it faster, but I also don't think it is very safe the way it is written.

    Suggestions are very much appreciated,

    LMHmedchem
    Attached Files Attached Files

  2. #32
    Join Date
    Feb 2017
    Posts
    157

    Re: a variation on all possible subsets

    Continued from my post #30.

    Using bit-sets has several advantages I think. It's a dense representation. For example a 64-bit integer (a long long int) can hold sets with up to 63 members (the sign bit is reserved for bit manipulations). Also many set operations on bit-sets are very fast since C++ supports bit-wise operations on integers. That includes checking for set membership, set union and intersection, and many other useful "bit-fiddling" operations (see for example "Bitwise Tricks and Techniques" (Knuth vol.4A)).

    My algorithm builds on the natural order of integers and specifically the reverse order. If N is the number of set members this would be from (2^N)-1 down to 1 (0 is the empty set and is excluded). I used the reversed order because then integers with many left 1's will appear early in the sequence. This corresponds to the requirement that the lower the set member numbers, the higher the priority of the set and the sooner it should be considered for exclusion.

    Input to the algorithm is a list of so called skip-sets. Each skip-set has exactly two members that are considered incompatible. The purpose of the algorithm is to exclude all sets which share members with any skip-set (when set AND skip-set EQUALS skip-set the set is excluded).

    The algorithm proceeds in the following way: At any bit-set it is determined whether it should be excluded. If so the algorithm skips to the closest bit-set that should be included. Otherwise all sub-sets are stored up to the closest bit-set that should be excluded. The algorithm will start with the first bit-set and alternatingly include/exclude whole sections of bit-sets until the last bit-set is reached.

    The main property of this algorithm is that not all bit-sets must be investigated for exclusion. Most bit-sets are simply just skipped over or stored without further ado. The algorithm is very simple in principle and the challenge is the convoluted nature of the bit-wise operations. On the other hand these are what makes this algorithm potentially very fast.

    Another advantage is that the algorithm can be run on any sub-sequence of the full sequence. It means the full sequence could be split up and processed in parallel. An obvious optimization that is implemented is that skip-sets that no longer apply are removed from the skip-set list making the algorithm faster as it proceeds.

    Finally I again would like to point out an important observation. This algorithm essentially splits a sequence of natural numbers into sub-sequences of natural numbers. These sub-sequences are fully determined by the skip-sets even before any algorithm has even run. Furthermore, the sub-sequences are fully determined by their first and last numbers (they are in fact even determined by the first number alone since the last number can be calculated from the first). So instead of storing the full sub-sequences it's enough to store just the first number (and possibly the last for convenience) which would be an enormous saving in storage needs.

    Have a look at the example in post #30 for example. Here the full sequence is [31 .. 1]. When it is subjected to the skip-sets {0,1} and {2,3} it splits into: [21 .. 16], [14 .. 8] and [5 ..1]. My algorithm simply jumps between those sub-sequences and stores the numbers of each. But why store all those numbers when it would be enough to store just these pairs of numbers: (21,16), (14,8) and (5,1)? And why store anything at all when one can generate the numbers on the fly almost as quickly as reading them from a vector?

    The code is posted in a third post.
    Last edited by wolle; March 2nd, 2017 at 04:12 AM.

  3. #33
    Join Date
    Feb 2017
    Posts
    157

    Re: a variation on all possible subsets

    Continued from my post #32.

    Here's now finally the code. It's tested but not up to production standards of course. But it does serve as "proof of concept" which was the purpose and this kind of low-level programming is plain fun .

    Code:
    #include <iostream>
    #include <string>
    #include <vector>
    #include <algorithm>
    
    // definitions
    
    using Bitset = long long int; // bitset must be SIGNED integer type
    constexpr Bitset ZERO = 0;
    constexpr Bitset ONE = 1;
    using Vecset = std::vector<int>; // vector set
    
    // helper functions
    
    Vecset vecset(Bitset b, int N) { // conversion - bitset to vector set
    	Vecset v={};
    	for (int i=0; i<N; ++i) {
    		if ((b & (ONE<<(N-1-i))) != 0) v.push_back(i);
    	}
    	return v;
    }
    
    Bitset bitset(const Vecset& v, int N) { // conversion - vector set to bitset
    	Bitset b=ZERO;
    	for (int i : v) {
    		assert(i>=0 && i<N);
    		b |= ONE<<(N-1-i); 
    	}
    	return b;
    }
    
    std::string string(Bitset b, int N) { // bitset as string
    	std::string s = "[";
    	for (int i=0; i<N; ++i) {
    		s += ((b & (ONE<<(N-1-i))) != 0) ? "1" : "0";
    	}
    	return s + "]";
    }
    
    std::string string(const Vecset& v) { // vector set as string
    	std::string s = "{";
    	const int N = static_cast<int>(v.size());
    	for (int i=0; i<N; ++i) {
    		s += std::to_string(v[i]) + ((i<N-1) ? "," : ""); 
    	}
    	return s + "}";
    }
    
    
    std::string allstring(Bitset set, int N) { // all set representations as string (bitset, vector set and integer)
    	return string(set,N) + " - " + string(vecset(set,N)) + " - (" + std::to_string(set) + ")";
    }
    
    
    // SkipStruct holds one skipset (bitset with 2 members that are considered incompatible)
    // (the bit-wise manipulations are from Knuth vol. 4A p. 140 - working with rightmost bits)
    
    struct SkipStruct { 
    
    	SkipStruct(Bitset s) : set(s) {}
    
    	Bitset skipset () const { // actual skipset
    		return set;
    	}
    	bool noLeftMember(Bitset s) const { // s is not in conflict with left member of this skipset?
    		return (LEFTBIT & s) == 0;
    	}
    	bool noRightMember(Bitset s) const { // s is not in conflict with right member of this skipset?
    		return (RIGHTBIT & s) == 0;
    	}
    	Bitset nextIncompatibleLeft(Bitset s) const { // assuming  s is not in conflict with left member of this skipset - calculate next incompatible bitset
    		return (s - LEFTBIT) | NOT_LEFTSMEARD_LEFTBIT;
    	}
    	Bitset nextIncompatibleRight(Bitset s) const {  // assuming s is not in conflict with right member of this skipset - calculate next incompatible bitset
    		return ((s & NOT_LEFTBIT) - RIGHTBIT) | ADD_ONES;
    	}
    	Bitset skipIncompatible(Bitset s) const { // s is in conflict with this skipset - calculate next not conflicting bitset
    		return (s & LEFTSMEARD_RIGHTBIT) - ONE;
    	}
    
    private:
    	Bitset set; // the skipset
    	Bitset RIGHTBIT = set & -set; // extract right member of skipset
    	Bitset LEFTBIT = set & (set-1); // ectract left member of skipset
    	Bitset LEFTSMEARD_RIGHTBIT = RIGHTBIT | -RIGHTBIT; // smear right member bit to the left
    	Bitset LEFTSMEARD_LEFTBIT = LEFTBIT | -LEFTBIT; // smear left member bit to the left
    	Bitset NOT_LEFTBIT = ~LEFTBIT; // invert all bits
    	Bitset NOT_LEFTSMEARD_LEFTBIT = ~LEFTSMEARD_LEFTBIT; // invert all bits
    	Bitset ADD_ONES = ~LEFTSMEARD_RIGHTBIT | LEFTBIT; // add on selected bits
    }; // Skipset
    
    
    
    
    // actual algorithm - traverses all bitsets from (2^N)-1 down to 1 excluding those that match skipsets held in a list
    
    
    void powerset6_reverse() {
    	const int N=5; // number of set members (members must be numbered from 0 to N-1)
    
    	std::vector<SkipStruct> skipstructvec = { // vector of skipset structs -  exactly 2 incompatible set members per skipset
    		bitset({0,1},N),
    		bitset({2,3},N),
    //		bitset({2,4},N), 
    //		bitset({1,4},N), 
    	};
    	std::sort(skipstructvec.begin(), skipstructvec.end(), // skipstructs are held in descending skipset order (largest first)
    		[](const SkipStruct& left, const SkipStruct& right) {return left.skipset() > right.skipset();}); 
    
    	std::cout << "*** skipsets *** "  << std::endl;
    	for (SkipStruct skipstruct : skipstructvec) {
    		std::cout << allstring(skipstruct.skipset(),N) << std::endl;
    	}
    
    	std::vector<Bitset> result; // vector of not conflicting bitsets
    
    	std::cout << "*** trace *** "  << std::endl;
    	auto p_start = skipstructvec.begin();
    	Bitset i = (ONE<<N) - ONE; // start with last set (2^N)-1 (total number of sets in the powerset of an n-set minus one)
    	while (i > ZERO) { // proceed down to 1
    		Bitset j = i;
    
    		i = ZERO; // next bitset
    		for (auto p = p_start; p != skipstructvec.end(); ++p) { // compare bitset j wih all skipsets
    			Bitset k;
    			if (p->noLeftMember(j)) { // no left member
    				k = p->nextIncompatibleLeft(j); // next incompatible set
    			} else if (p->noRightMember(j)) { // no right member
    				k = p->nextIncompatibleRight(j); // next incompatible set
    			} else { // left and right members in conflict
    				i = p->skipIncompatible(j);  //  skip to next compatible set
    				while (j > i) std::cout << "skip: " << allstring(j--,N) << std::endl;
    				j = i; // in case the while loop is removed
    				break;
    			}
    			if (k>i) i=k; // jump to closest incompatible set
    			else if (k<ZERO) ++p_start; // upon any underflow, drop largest skipset
    		}
    
    		while (j > i) { // store nonconflicting bitsets
    			std::cout << "      " << allstring(j,N) << std::endl;
    			result.push_back(j--);
    		}
    	}
    
    	std::cout << "*** result *** " << std::endl;	
    	for (auto bitset : result) std::cout << allstring(bitset,N) << std::endl;
    
    } // powerset6_reverse
    Last edited by wolle; March 1st, 2017 at 04:28 AM.

  4. #34
    2kaud's Avatar
    2kaud is offline Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    5,786

    Re: a variation on all possible subsets

    The code is set up to process the N=40 problem 10 times and on this system, that takes about 3.5 seconds.
    It would still be nice to have it faster, but I also don't think it is very safe the way it is written
    Just a note of practicality. What's the perceived ROI (return on investment) in having a slightly faster/optimised program against the time invested?
    All advice is offered in good faith only. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/

    C++17 Compiler: Microsoft VS2017 (15.4.0)

  5. #35
    Join Date
    May 2009
    Location
    Boston
    Posts
    318

    Re: a variation on all possible subsets

    Quote Originally Posted by wolle View Post
    Here's now finally the code. It's tested but not up to production standards of course. But it does serve as "proof of concept" which was the purpose and this kind of low-level programming is plain fun .
    Thank you for your input so far, I have been trying to compile to code you posted and can't seem to get beyond the first line.

    For the line,
    Code:
    using Bitset = long long int; // bitset must be SIGNED integer type
    I get the errors,
    Code:
    APS_3.cpp:9: error: expected nested-name-specifier before "Bitset"
    APS_3.cpp:9: error: `Bitset' has not been declared
    APS_3.cpp:9: error: expected `;' before '=' token
    APS_3.cpp:9: error: expected unqualified-id before '=' token
    APS_3.cpp:9: error: expected `,' or `;' before '=' token
    It seems to be acting like it is expecting a template and template parameter or something like that. I tried adding #include <bitset>, but that doesn't help. I also tried declaring "Bitset"

    I am using a fairly old compiler (g++3.3), so that may be an issue. I am guessing that this line is a c++11 type alias or something like that??? I can work with a more recent compiler to test this code, but the final code may not be compatible with the very old app I am working on.

    Also, do I need to build this in 64-bit?

    If I try to compile with c++ 5.3, with,
    Code:
    c++ -std=c++11 -c APS_3.cpp
    I get a different set of errors,
    Code:
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1069:11: error: ‘::acoshl’ has not been declared
       using ::acoshl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1073:11: error: ‘::asinhl’ has not been declared
       using ::asinhl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1077:11: error: ‘::atanhl’ has not been declared
       using ::atanhl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1081:11: error: ‘::cbrtl’ has not been declared
       using ::cbrtl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1085:11: error: ‘::copysign ’ has not been declared
       using ::copysignl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1089:11: error: ‘::erfl’ has not been declared
       using ::erfl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1093:11: error: ‘::erfcl’ has not been declared
       using ::erfcl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1097:11: error: ‘::exp2l’ has not been declared
       using ::exp2l;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1101:11: error: ‘::expm1l’ has not been declared
       using ::expm1l;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1105:11: error: ‘::fdiml’ has not been declared
       using ::fdiml;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1109:11: error: ‘::fmal’ has not been declared
       using ::fmal;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1113:11: error: ‘::fmaxl’ has not been declared
       using ::fmaxl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1117:11: error: ‘::fminl’ has not been declared
       using ::fminl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1125:11: error: ‘::ilogbl’ has not been declared
       using ::ilogbl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1129:11: error: ‘::lgammal’ has not been declared
       using ::lgammal;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1137:11: error: ‘::llroundl’ has not been declared
       using ::llroundl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1141:11: error: ‘::log1pl’ has not been declared
       using ::log1pl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1145:11: error: ‘::log2l’ has not been declared
       using ::log2l;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1149:11: error: ‘::logbl’ has not been declared
       using ::logbl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1157:11: error: ‘::lroundl’ has not been declared
       using ::lroundl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1161:11: error: ‘::nanl’ has not been declared
       using ::nanl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1165:11: error: ‘::nearbyintl’ has not been declared
       using ::nearbyintl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1169:11: error: ‘::nextafterl’ has not been declared
       using ::nextafterl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1171:11: error: ‘::nexttoward’ has not been declared
       using ::nexttoward;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1172:11: error: ‘::nexttowardf’ has not been declared
       using ::nexttowardf;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1173:11: error: ‘::nexttowardl’ has not been declared
       using ::nexttowardl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1177:11: error: ‘::remainderl’ has not been declared
       using ::remainderl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1181:11: error: ‘::remquol’ has not been declared
       using ::remquol;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1189:11: error: ‘::roundl’ has not been declared
       using ::roundl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1193:11: error: ‘::scalblnl’ has not been declared
       using ::scalblnl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1197:11: error: ‘::scalbnl’ has not been declared
       using ::scalbnl;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1201:11: error: ‘::tgammal’ has not been declared
       using ::tgammal;
               ^
    /usr/lib/gcc/i686-pc-cygwin/5.3.0/include/c++/cmath:1205:11: error: ‘::truncl’ has not been declared
       using ::truncl;
    This looks like some kind of namespace issue one way or the other.

    LMHmedchem
    Last edited by LMHmedchem; March 1st, 2017 at 07:09 PM.

  6. #36
    Join Date
    Feb 2017
    Posts
    157

    Re: a variation on all possible subsets

    Quote Originally Posted by LMHmedchem View Post

    Code:
    using Bitset = long long int; // bitset must be SIGNED integer type
    I get the errors,

    Also, do I need to build this in 64-bit?

    This looks like some kind of namespace issue one way or the other.
    For the initial definitions you could do this instead

    Code:
    typedef long long int Bitset; // bitset must be SIGNED integer type
    static const Bitset ZERO = 0;
    static const Bitset ONE = 1;
    typedef std::vector<int> Vecset; // vector set
    If long long int isn't supported maybe the compiler supports some other 64-bit int type like int64_t or so instead?

    If the compiler doesn't support 64-bit integers at all then you could use an int or maybe long int which ever is 32 bit. But then only sets with up to 31 members can be handled and since you have up to 40 members 64-bit is preferred.

    You don't need to compile to 64-bit code. I used 32-bit actually.

    The other problems seem to be related to cmath. Is it properly included?

    Please feel free to post any further problems, I'll rewrite to older C++ when necessary.

    And note that I've added a paragraph at the end of post #32.
    Last edited by wolle; March 2nd, 2017 at 08:25 AM.

Page 3 of 3 FirstFirst 123

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This a Codeguru.com survey!


On-Demand Webinars (sponsored)