CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com

# Thread: a variation on all possible subsets

1. ## a variation on all possible subsets

I apologize in advance for the incompleteness of this post. I won't be back at my desk until later today and i will make a more complete post then.

I am working on an algorithm where I need to generate all possible combinations of a series of integers. These are "combinations" as opposed to "permutations" meaning that the order is not important. The set 1,2,3 is the same as the set 3,2,1 or 2,1,3, etc. Most algorithms for this kind of thing are create all the combinations of size k from the set of size n. I am creating all the combinations of all sizes k.

For example, if my set is, 0,1,2,3,4, I find 31 possible combinations.

Code:
``` 0           // 0 by itself
0,1         // pairs on 0
0,2
0,3
0,4
0,1,2       // triplets on 0
0,1,3
0,1,4
0,2,3
0,2,4
0,3,4
0,1,2,4
0,1,3,4
0,2,3,4
0,1,2,3,4   // quintuplets on 0
1           // 1 by itself
1,2         // pairs on 1
1,3
1,4
1,2,3
1,2,4
1,3,4
1,2,3,4
2
2,3
2,4
2,3,4
3
3,4
4```
Mathematically this is something like 5!-redundancies, but I don't remember off hand how to calculate the redundancies so I'm not sure that 31 is correct. It looks correct as far as I can see.

What I have tried is starting with the first element in the set and building each pair on that element, each triplet, etc. it seems as if a simple looping structure should be able to product this, something like
Code:
```for(i=0; i<set.size()-1; i++) {
for(j=i+1; j<set.size(); j++) {

}
}```
My algorithm keeps getting more and more complicated and it seems like it should be simple. I have some versions that work, but they produce duplicates that need to be removed after the fact. It seems that sorting and creating the sub-sets based on the sorted order should prevent duplicates. My overall algorithm is much more complex in that there are a number of tasks that are being performed along the way (each subset is being evaluated) and the code has become difficult to debug. Some of the actual data I am working with has sets of 40 or so and the output is too large to examine manually (calculate 40 factorial sometime). I need to restart this an make sure that the basic looping structure is sound.

I am hoping someone can point me to a reference for how to set up a simple block of code to do this part. It is the kind of thing that must be done all of the time, but my searches haven't turned up quite the right example yet.

If anyone has suggestions, they would be very much appreciated. I will post more complete code samples later today when I get back. At the moment, I am actually compiling with gcc3 under 32-bit cygwin and 64-bit opensuse linux.

LMHmedchem

2. Senior Member
Join Date
Oct 2008
Posts
1,456

## Re: a variation on all possible subsets

Originally Posted by LMHmedchem
I am creating all the combinations of all sizes k.
being this set infinite, you should also specify some criteria to fix the ordering you want them to be ( consider the loop for(i=0;;++i) print(i);, this will output infinitely many valid combinations according to your definition, but I doubt this is what you're looking for, isn't it ? )

3. Member
Join Date
Feb 2017
Posts
296

## Re: a variation on all possible subsets

Originally Posted by LMHmedchem
Most algorithms for this kind of thing are create all the combinations of size k from the set of size n. I am creating all the combinations of all sizes k.
Well, from your example it looks like you want to generate all k-subsets of an n-set in lexical order and then repeat this for all k between 1 and n.

The number of subsets for a certain k and n is n over k. If we denote this with (n/k) you will have (5/1) + (5/2) + (5/3) + (5/4) + (5/5) subsets. That is 5 + 10 + 10 + 5 + 1 = 31 which is consistent with your example. If we also add the empty set (5/0)=1 we get 32 subset and that's all 2^5 sets in the power-set of a 5-set.

So you are essentially generating the full power-set of an n-set. The simplest algorithm for this is to loop through all integers from 0 to 2^n - 1 and view each integer as a bit-set (the bit positions that are 1 in an integer represent the set-members that are present in that set).
Last edited by wolle; February 21st, 2017 at 03:44 AM.

4. ## Re: a variation on all possible subsets

Originally Posted by superbonzo
being this set infinite, you should also specify some criteria to fix the ordering you want them to be ( consider the loop for(i=0;;++i) print(i);, this will output infinitely many valid combinations according to your definition, but I doubt this is what you're looking for, isn't it ? )
Sorry, yes there are a few more criteria. Elements in the set cannot repeat, meaning 1,1,2 is not valid, nor is 1,2,3,4,1. This would mean that the largest possible subset is the entire set itself.

Originally Posted by wolle
Well, from your example it looks like you want to generate all k-subsets of an n-set in lexical order and then repeat this for all k between 1 and n.

The number of subsets for a certain k and n is n over k. If we denote this with (n/k) you will have (5/1) + (5/2) + (5/3) + (5/4) + (5/5) subsets. That is 5 + 10 + 10 + 5 + 1 = 31 which is consistent with your example. If we also add the empty set (5/0)=1 we get 32 subset and that's all 2^5 sets in the power-set of a 5-set.
Thank you for that explanation, it is very helpful. The lexical order is not a requirement, it was just the easiest way for me to think about manually generating the output above. That being said, it seems logical for the algorithm to work that way.

Originally Posted by wolle
So you are essentially generating the full power-set of an n-set. The simplest algorithm for this is to loop through all integers from 0 to 2^n - 1 and view each integer as a bit-set (the bit positions that are 1 in an integer represent the set-members that are present in that set).
That is an interesting approach. The data will be a vector of unsigned int, so the first element will be 0.

The method I am currently using creates singles, then pairs, etc. In part this is because I am checking the "compatibility" as I go. This just means I have a lookup table that tells me that 1 is not compatible with 0 (for example). In my list above, 0,1 would be created, evaluated, and then skipped. The consequence of this is that the following sub-sets never get created because the branch 0,1 is never followed (if that makes sense).

Code:
```0,1     // 0 and 1 are not compatible
0,1,2
0,1,3
0,1,4
0,1,2,3
0,1,2,4
0,1,3,4
0,1,2,3,4```
This is very important in the larger sets where the total number of possible subsets is exponentially large. Do you think that something similar could be done within the framework of the bit-set approach you suggest?

LMHmedchem

5. ## Re: a variation on all possible subsets

Another way to generate the required combinations is to use a recursive function/lambda. Consider
Code:
```#include <iostream>
#include <vector>
#include <functional>
using namespace std;

void comb(unsigned int n, unsigned int r, vector < vector<unsigned int> >& vvi)
{
function<void (int, int, vector<unsigned int>&)> cc = [&](unsigned int n, unsigned int r, vector<unsigned int>& arr)
{
for (unsigned int i = n; i >= r; i--) {
arr[r - 1] = i;
if (r > 1)
cc(i - 1, r - 1, arr);
else
vvi.emplace_back(arr);
}
};

vector<unsigned int> arr(r);
cc(n, r, arr);
}

int main() {
const unsigned int N = 5;

vector < vector<unsigned int> > vvi;

for (unsigned int m = 1; m <= N; ++m)
comb(N, m, vvi);

for (const auto& vv : vvi) {
for (const auto& v : vv)
cout << v << " ";

cout << endl;
}

cout << vvi.size() << " combinations" << endl;
}```
This populates the vector vvi with all possible combinations within the set of N integers. For the example N = 5, the output is
Code:
```5
4
3
2
1
4 5
3 5
2 5
1 5
3 4
2 4
1 4
2 3
1 3
1 2
3 4 5
2 4 5
1 4 5
2 3 5
1 3 5
1 2 5
2 3 4
1 3 4
1 2 4
1 2 3
2 3 4 5
1 3 4 5
1 2 4 5
1 2 3 5
1 2 3 4
1 2 3 4 5
31 combinations```
Last edited by 2kaud; February 21st, 2017 at 11:52 AM. Reason: Code changed to use lambda

6. ## Re: a variation on all possible subsets

Originally Posted by 2kaud
Another way to generate the required combinations is to use a recursive function/lambda.
This populates the vector vvi with all possible combinations within the set of N integers.
For my first attempt, I did this in a while loop but I was thinking that a recursive function call might make more sense.

Sorry I am not at my computer to test this, but it is sort of important that the sets be created in order from 0-n. The integers I am creating sub-sets of represent objects in a vector. The objects have been sorted on a member variable such that object element 0 is the most important, element 1 is the second most important, etc. It may turn out that there are cases where I simply cannot create and analyze all the possible combinations and in that event, I need to look at the most important cases first.

Is it possible to modify that above code so that the build order is 1,2,3,4,5 then 2,3,4,5 instead of 5,4,3,2,1 then 4,3,2,1 like it is now? I could sort my vector of objects in reverse order or something like that if I can confirm that won't break anything else.

Again, sorry I'm not at my computer where I can test these things instead of asking more questions. I will try to evaluate that question my self later today and will post if I get an answer.

LMHmedchem

7. ## Re: a variation on all possible subsets

I am back at my computer and am not able to compile the code posted by 2kaud. I get the following compile error,
Code:
```APS.cpp: In function `void comb(unsigned int, unsigned int, std::vector<std::vector<unsigned int, std::allocator<unsigned int> >, std::allocator<std::vector<unsigned int, std::allocator<unsigned int> > > >&)':
APS.cpp:89: error: `function' undeclared (first use this function)
APS.cpp:89: error: (Each undeclared identifier is reported only once for each function it appears in.)
APS.cpp:89: error: functional cast expression list treated as compound expression
APS.cpp:89: error: expected primary-expression before "void"
APS.cpp:89: error: expected `;' before "void"
APS.cpp:101: error: `cc' undeclared (first use this function)
APS.cpp: In function `int main()':
APS.cpp:112: error: expected primary-expression before "const"
APS.cpp:112: error: expected `;' before "const"
APS.cpp:120: error: expected primary-expression before '}' token
APS.cpp:120: error: expected `)' before '}' token
APS.cpp:120: error: expected primary-expression before '}' token
APS.cpp:120: error: expected `;' before '}' token```
Searching on "`function' undeclared (first use this function)" doesn't return anything relevant. I don't think it likes the lamda expression. This is gcc3 I am compiling with, so it may be too old. Aren't lamda expressions c++11? Searching on "functional cast expression list treated as compound expression" seems to also point to the need for c++11.

LMHmedchem
Last edited by LMHmedchem; February 21st, 2017 at 02:45 PM.

8. ## Re: a variation on all possible subsets

Yes, you need c++11. Isn't there an option with gcc to 'turn on' c++11 features? I think they are off by default, but I don't use gcc.

9. ## Re: a variation on all possible subsets

If that is going to be a problem, I can split the lambda back into another function. I did it that way first, but as the lambda function is used only within function comb() and c++ doesn't allow nested functions, I tend to use lambda functions as a poor man's nested function!

I think this revised code will give you what you want. If you need the lambda's removed let me know. Have fun!
Code:
```#include <iostream>
#include <vector>
#include <functional>
#include <algorithm>
using namespace std;

void comb(unsigned int N, vector<vector<unsigned int>>& nv)
{
auto comb1 = [&](unsigned int n, unsigned int r, vector <vector<unsigned int>>& vvi)
{
function<void(int, int, vector<unsigned int>&)> cc = [&](unsigned int n, unsigned int r, vector<unsigned int>& arr)
{
for (unsigned int i = r; i <= n; ++i) {
arr[r - 1] = i - 1;
if (r > 1)
cc(i - 1, r - 1, arr);
else
vvi.emplace_back(arr);
}
};

vector<unsigned int> arr(r);
cc(n, r, arr);
};

auto sr = [&](vector<vector<unsigned int>>& sv) {
sort(sv.begin(), sv.end());

for (auto& a : sv)
nv.emplace_back(a);

sv.clear();
};

vector<vector<unsigned int>> vvi;

for (unsigned int m = 1; m <= N; ++m)
comb1(N, m, vvi);

vector<vector<unsigned int>> sv;
size_t sz = vvi.crbegin()->size();

for (auto vv = vvi.crbegin(); vv != vvi.crend(); ++vv)
if (vv->size() == sz)
sv.emplace_back(*vv);
else {
sz = vv->size();
sr(sv);
sv.emplace_back(*vv);
}

sr(sv);
}

int main()
{
const unsigned int N = 5;

vector<vector<unsigned int>> nv;

comb(N, nv);

for (auto& a : nv) {
for (auto& b : a)
cout << b << " ";

cout << endl;
}

cout << nv.size() << " combinations" << endl;
}```
gives
Code:
```0 1 2 3 4
0 1 2 3
0 1 2 4
0 1 3 4
0 2 3 4
1 2 3 4
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4
0 1
0 2
0 3
0 4
1 2
1 3
1 4
2 3
2 4
3 4
0
1
2
3
4
31 combinations```
Last edited by 2kaud; February 21st, 2017 at 04:58 PM.

10. ## Re: a variation on all possible subsets

Ok. This is the code without using lambdas. Consider
Code:
```#include <iostream>
#include <vector>
#include <functional>
#include <algorithm>
using namespace std;

void cc(unsigned int n, unsigned int r, vector<unsigned int>& arr, vector< vector<unsigned int> >& vvi)
{
for (unsigned int i = r; i <= n; ++i) {
arr[r - 1] = i - 1;
if (r > 1)
cc(i - 1, r - 1, arr, vvi);
else
vvi.emplace_back(arr);
}
}

void comb(unsigned int n, unsigned int r, vector < vector<unsigned int> >& vvi)
{
vector<unsigned int> arr(r);
cc(n, r, arr, vvi);
}

void sr(vector< vector<unsigned int> >& sv, vector< vector<unsigned int> >& nv)
{
sort(sv.begin(), sv.end());

for (auto& a : sv)
nv.emplace_back(a);

sv.clear();
}

void comb1(unsigned int N, vector< vector<unsigned int> >& nv)
{
vector< vector<unsigned int> > vvi;
vector< vector<unsigned int> > sv;

for (unsigned int m = 1; m <= N; ++m)
comb(N, m, vvi);

size_t sz = vvi.crbegin()->size();

for (auto vv = vvi.crbegin(); vv != vvi.crend(); ++vv) {
if (vv->size() == sz)
sv.emplace_back(*vv);
else {
sz = vv->size();
sr(sv, nv);
sv.emplace_back(*vv);
}
}

sr(sv, nv);
}

int main()
{
const unsigned int N = 5;

vector< vector<unsigned int> > nv;

comb1(N, nv);

for (auto& a : nv) {
for (auto& b : a)
cout << b << " ";

cout << endl;
}

cout << nv.size() << " combinations" << endl;
}```

11. ## Re: a variation on all possible subsets

Originally Posted by 2kaud
Ok. This is the code without using lambdas.
Thanks for this. I can't seem to get c++1 working, I guess this setup is just to old. I have tried a few of the args to g++ (g++ -std=c++11 ), but the aren't recognized.

I won't be able to use emplace_back(), so should I just replace that with push_back()? Will I need the explicit construction of emplace_back()?

LMHmedchem

12. ## Re: a variation on all possible subsets

No, you can just use push_back() instead of emplace_back(). However, you probably won't be able to use range-based for loop either? Is auto ok?

PS You probably don't have cbegin() and cend() either?
Last edited by 2kaud; February 21st, 2017 at 04:48 PM. Reason: PS

13. ## Re: a variation on all possible subsets

I think this is c++98 ish - with auto. Hopefully this will compile for you.
Code:
```#include <iostream>
#include <vector>
#include <functional>
#include <algorithm>
using namespace std;

void cc(unsigned int n, unsigned int r, vector<unsigned int>& arr, vector< vector<unsigned int> >& vvi)
{
for (unsigned int i = r; i <= n; ++i) {
arr[r - 1] = i - 1;
if (r > 1)
cc(i - 1, r - 1, arr, vvi);
else
vvi.push_back(arr);
}
}

void comb(unsigned int n, unsigned int r, vector < vector<unsigned int> >& vvi)
{
vector<unsigned int> arr(r);
cc(n, r, arr, vvi);
}

void sr(vector< vector<unsigned int> >& sv, vector< vector<unsigned int> >& nv)
{
sort(sv.begin(), sv.end());

//for (auto& a : sv)
for (auto a = sv.begin(); a != sv.end(); ++a)
nv.push_back(*a);

sv.clear();
}

void comb1(unsigned int N, vector< vector<unsigned int> >& nv)
{
vector< vector<unsigned int> > vvi;
vector< vector<unsigned int> > sv;

for (unsigned int m = 1; m <= N; ++m)
comb(N, m, vvi);

size_t sz = vvi.rbegin()->size();

for (auto vv = vvi.rbegin(); vv != vvi.rend(); ++vv)
if (vv->size() == sz)
sv.push_back(*vv);
else {
sr(sv, nv);
sz = vv->size();
sv.push_back(*vv);
}

sr(sv, nv);
}

int main()
{
const unsigned int N = 5;

vector< vector<unsigned int> > nv;

comb1(N, nv);

//for (auto& a : nv) {
//for (auto& b : a)
for (auto a = nv.begin(); a != nv.end(); ++a) {
for (auto b = a->begin(); b != a->end(); ++b)
cout << *b << " ";

cout << endl;
}

cout << nv.size() << " combinations" << endl;
}```

14. ## Re: a variation on all possible subsets

I have compiled and ran a variation of the code posted by 2kaud in post 13,

Code:
```#include <iostream>
#include <vector>
#include <functional>
#include <algorithm>
using namespace std;

void cc(unsigned int n, unsigned int r, vector<unsigned int>& arr, vector< vector<unsigned int> >& vvi) {

// loop iterator
unsigned int i;

for (i = r; i <= n; ++i) {
arr[r - 1] = i - 1;
if (r > 1) {
cc(i - 1, r - 1, arr, vvi);
}
else {
vvi.push_back(arr);
}
}

return;
}

void comb(unsigned int n, unsigned int r, vector < vector<unsigned int> >& vvi) {
vector<unsigned int> arr(r);
cc(n, r, arr, vvi);
}

void sr(vector< vector<unsigned int> >& sv, vector< vector<unsigned int> >& nv) {

sort(sv.begin(), sv.end());

// loop iterator
vector< vector<unsigned int> >::iterator a;

for (a = sv.begin(); a != sv.end(); ++a) {
nv.push_back(*a);
}

sv.clear();

return;
}

void comb1(unsigned int N, vector< vector<unsigned int> >& nv) {

vector< vector<unsigned int> > vvi;
vector< vector<unsigned int> > sv;

// loop iterator
unsigned int m;

// loop from 1 to input size and call
for(m = 1; m <= N; ++m) {
comb(N, m, vvi);
}

size_t sz = vvi.rbegin()->size();

// loop iterator
vector< vector<unsigned int> >::iterator vv;

// changed below from rbegin()/rend() to begin()/end() to fix compiler error
// for (vv = vvi.rbegin(); vv != vvi.rend(); ++vv) {
for (vv = vvi.begin(); vv != vvi.end(); ++vv) {
if (vv->size() == sz) {
sv.push_back(*vv);
}
else {
sr(sv, nv);
sz = vv->size();
sv.push_back(*vv);
}
}

sr(sv, nv);

return;
}

int main() {

// size of set being processed
const unsigned int N = 25;

// final result vector
vector< vector<unsigned int> > nv;

// call function to generation all unique combinations of N
comb1(N, nv);

// loop iterators
// a iterates over vector<vector<unsigned int>>
vector< vector<unsigned int> >::iterator a;
// b points to a, which is a specific vector<unsigned int> in nv
vector<unsigned int>::iterator b;

// loop over result vector<vector<unsigned int>> and print
for(a = nv.begin(); a != nv.end(); ++a) {
for(b = a->begin(); b != a->end(); ++b) {
cout << *b << " ";
}
cout << endl;
}
// print final size
cout << nv.size() << " combinations" << endl;

return 0;
}```
This results in the following printout for N=5,
Code:
```0
1
2
3
4
0 1
0 2
0 3
0 4
1 2
1 3
1 4
2 3
2 4
3 4
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4
0 1 2 3
0 1 2 4
0 1 3 4
0 2 3 4
1 2 3 4
0 1 2 3 4```
This is the output I was looking for, so thank you very much for the help.

I made a few changes to get it to compile. I had to explicitly declare the iterators because auto doesn't work with the compiler I am using. I also had to change rbegin()/rend() to begin()/end() in the loop on vvi in comb1() because I was getting a compiler error there, error: no match for 'operator!=' in 'vv != std::vector<_Tp, _Alloc>::rend(). The types that I used for the iterators make sense to me, but please let me know if I didn't get that right.

This works well up to N=23 with 8388607 combinations. At N=24, I get a segfault. Running gdb doesn't give me a location for where the segfault occurs. I can't tell if this means that the corruption is in the heap or not.

I first tried this with N=40, which is an actual data set I am working with. I think this means that I am trying to create 2^40 combinations on a 2^32 sized system, which would of course be problematic. I am not in fact, running out of memory. At N=24, the process gets up to about 2GB of memory and then crashes, I still have about .5GB left at that point. I am wondering if the issue is that I am exceeding the maximum size of the final result vector, meaning that the iterator can't address anything larger.

This will eventually run on a 64-bit system, so possibly I could go past N=24 there but if I need 2GB or RAM to do N=24, I think I would need quite a bit to do N=40. If the problem was just the size of the vector, I could switch to a second vector once the first one got full, but I think the hardware memory issue cannot be avoided.

In reality, I will never process N=40 because of what I mentioned in an earlier post. Not every set is compatible (certain numbers cannot co-exist) and I am checking that as I go. Even saying that 0 and 1 are not compatible eliminates a large number of possibilities.

I do need to figure out how to trap this properly. I don't think that just letting it segfault is a good idea. I could just hard code a limit at 23, but since I don't know how many sets will never get created because of incompatibility, that would be selling the algorithm short.

If I could determine the number of combinations based on N and the number of in-compatible pairs, that might help to make a better estimate. It might make more sense to just monitor the size of the final result vector and have it quit when it is getting too large.

Suggestions on how to handle this part would be appreciated as always.

LMHmedchem

15. ## Re: a variation on all possible subsets

I made a few changes to get it to compile. I had to explicitly declare the iterators because auto doesn't work with the compiler I am using. I also had to change rbegin()/rend() to begin()/end() in the loop on vvi in comb1() because I was getting a compiler error there, error: no match for 'operator!=' in 'vv != std::vector<_Tp, _Alloc>::rend().
I don't suppose it's going to be of much help to suggest you update the compiler to a modern one...

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•