All possible combinations problem

**LMHmedchem** · November 30th, 2015, 01:48 PM

Hello,

I have to create all possible combinations of some string data.

The input data is string and looks like,

Code:

b_00000000011100000000000

For a given input string, I need to generate all of the strings that could be considered subsets of the input. This just means any combination of position matching 1s up to n-1. The leading b_ is not relevant. For example,

Code:

for input string,
b_00000000011100000000000

output,
b_00000000011000000000000
b_00000000001100000000000
b_00000000010100000000000
b_00000000010000000000000
b_00000000001000000000000
b_00000000000100000000000

This is fairly straightforward for a string with three 1s, but I have strings with up to 11 1s, so this needs to be done algorithmically. If there were 10 1s, I would need to generate all of the overlapping patterns of 1 to 10 1s.

This is an example of 11 1s,
b_11110011011100110000000

One approach would start with each single 1 and then add each other 1 to make all of the pairs, then use the pairs to make the triplets, etc. Code for this kind of thing often looks like creating paths with a breadth first search, like adding neighbor vertex numbers. I have often found that there are clever algorithms for coding problems like this, so if anyone knows of something that might suit, that would be a big help.

LMHmedchem

**GeoRanger** · November 30th, 2015, 11:44 PM

You just want all possible combinations of one's and zero's only in the positions where the input has a one, right?

If so, think of the zero's and one's as simply being a binary number. If you count from zero all the way up to the input and skip digits which aren't set in the input, I think you'll get what you want. A recursive algorithm works well for doing this.

Code:

#include <vector>
#include <iostream>

using namespace std; 

void OutputRemainingCounter(std::vector<char> vProcessed, std::vector<char> vUnprocessed)
{
    vector<char>::iterator const iStart   = vUnprocessed.begin();
    vector<char>::iterator       iCurrent = vUnprocessed.begin();
    vector<char>::iterator const iEnd     = vUnprocessed.end  ();

    while (iCurrent < iEnd && *iCurrent != '1')
    {
        vProcessed.push_back(*iCurrent);
        ++iCurrent;
    }

    if (iCurrent < iEnd)
    {
        vector<char> vNewProcessed, vNewUnprocessed;

        vNewProcessed = vProcessed;
        vNewProcessed.push_back('0');

        vNewUnprocessed.insert(vNewUnprocessed.begin(), iCurrent + 1, iEnd);

        OutputRemainingCounter(vNewProcessed, vNewUnprocessed);

        *(vNewProcessed.end() - 1) = '1';

        OutputRemainingCounter(vNewProcessed, vNewUnprocessed);
    }
    else
    {
        vProcessed  .push_back('\0');
        vUnprocessed.push_back('\0');
        cout << &vProcessed[0] << &vUnprocessed[0] << endl;
    }
}

void main(void)
{
    vector<char> v;
    v.push_back('b');
    v.push_back('_');
    v.push_back('0');
    v.push_back('0');
    v.push_back('0');
    v.push_back('1');
    v.push_back('1');
    v.push_back('1');
    v.push_back('0');
    v.push_back('0');
    v.push_back('1');
    v.push_back('1');
    v.push_back('0');

    OutputRemainingCounter(vector<char>(), v);
}

Of course, this would take a long time if a lot of the bits were set (because you'd effectively be counting from 0 to 2^N). I didn't bother to filter off the case where all bits are zero.

Output:

Code:

b_000000000000
b_000000000100
b_000000001000
b_000000001100
b_000001000000
b_000001000100
b_000001001000
b_000001001100
b_000010000000
b_000010000100
b_000010001000
b_000010001100
b_000011000000
b_000011000100
b_000011001000
b_000011001100
b_000100000000
b_000100000100
b_000100001000
b_000100001100
b_000101000000
b_000101000100
b_000101001000
b_000101001100
b_000110000000
b_000110000100
b_000110001000
b_000110001100
b_000111000000
b_000111000100
b_000111001000
b_000111001100

An even better way might be to detect the number of set bits in your input and create a map of their positions, count through the number of set bits (contiguously) in a regular int, but every time you increment that counter, use the map you've created to plot the bits in the counter to bits in a separate int which correspond to the set bits in your input.

Hope it helps,
GeoRanger

**monarch_dodra** · December 1st, 2015, 08:16 AM

Originally Posted by LMHmedchem

Hello,

I have to create all possible combinations of some string data.

The input data is string and looks like,

Code:

b_00000000011100000000000

For a given input string, I need to generate all of the strings that could be considered subsets of the input. This just means any combination of position matching 1s up to n-1. The leading b_ is not relevant. For example,

Code:

for input string,
b_00000000011100000000000

output,
b_00000000011000000000000
b_00000000001100000000000
b_00000000010100000000000
b_00000000010000000000000
b_00000000001000000000000
b_00000000000100000000000

This is fairly straightforward for a string with three 1s, but I have strings with up to 11 1s, so this needs to be done algorithmically. If there were 10 1s, I would need to generate all of the overlapping patterns of 1 to 10 1s.

This is an example of 11 1s,
b_11110011011100110000000

One approach would start with each single 1 and then add each other 1 to make all of the pairs, then use the pairs to make the triplets, etc. Code for this kind of thing often looks like creating paths with a breadth first search, like adding neighbor vertex numbers. I have often found that there are clever algorithms for coding problems like this, so if anyone knows of something that might suit, that would be a big help.

LMHmedchem

Another way yo see it is that you are basically just generating all combinations for N {1,0} pairs, where "N" is the amount of 1's in your original string. From there, your original string applies a "0" mask over it to make it pretty, but it is irrelevant your problem at hand.

"generating all combinations for N {1,0} pairs" is basically counting in binary from 0 to n^^2 - 1. From there, you need to insert the 0's. I guess there are several ways to do the ladder. This simple algorithm is "L x N^^2" where L is the length of your string, and N is the ammount of 1's in your string. This is optimal as the answer occupies that much screen real-estate, so any solution must be at least that complex.

Code:

#include <iostream>
#include <algorithm>

using namespace std;

int main(){
  auto input = string{"b_00000000011100000000000"};
  auto one_count = count(input.begin(), input.end(), '1');

  auto maximum = 1 << one_count;
  for (auto i = 1; i < maximum - 1; ++i) {
    auto max_mask = maximum;
    cout << "b_";
    for (auto it = input.begin() + 2; it != input.end(); ++it) {
      if (*it == '0') {
        cout << '0';
      } else {
        max_mask >>= 1;
        cout << ((max_mask & i) != 0);
      }
    }
    cout << endl;
  }
}

**tiliavirga** · December 3rd, 2015, 01:31 AM

Originally Posted by LMHmedchem

I have often found that there are clever algorithms for coding problems like this, so if anyone knows of something that might suit, that would be a big help.

Your problem corresponds to finding all possible subsets of a set (also called the powerset). There are 2^N such subsets where N is the number of set members.

One common approach is to consider the bit representation of the number sequence from 0 to 2^N - 1. For example if N is 3 the numbers are 0, 1, 2, 3, 4, 5, 6, 7 and the corresponding bit representations are 000, 001, 010, 011, 100, 101, 110, 111. These are also called bitsets. In your case 111 would correspond to any input string with three 1's and the numbers 1 to 6 would encode for all possible variations of that input string (0 is discarded and 7 is the input string itself).

This solves the combinatorial part. For an input string with N 1's all numbers from 1 up to 2^N - 2 are looped through and for each number the input string is modified in N places according to the number bitset. This gives a complexity of O(2^N * N).

I think it's possible to reduce this to O(2^N). The idea is to use the Gray ordering rather than the natural ordering of numbers. In Gray ordering the bit representation sequence for N=3 becomes 000, 001, 011, 010, 110, 111, 101, 100. What's interesting here is that just one bit changes from one bit representation to the next. Utilized in your case that would mean that instead of N bits just one bit must be modified in the input string. That results in O(2^N * 1) which is O(2^N).

So since there are 2^N variations the lowest possible complexity is O(2^N) and that's what you get with an iterative solution and Gray ordering. I guess that's also what a recursive solution should yield but it comes at the expense of ... well, recursion

.

I'll post some code later.

**superbonzo** · December 3rd, 2015, 05:03 AM

Originally Posted by tiliavirga

What's interesting here is that just one bit changes from one bit representation to the next. Utilized in your case that would mean that instead of N bits just one bit must be modified in the input string. That results in O(2^N * 1) which is O(2^N).

the fact that you're changing just one bit does not mean you do it in O(1), you still need a time proportional to the number of max bits in a way or the other...

**tiliavirga** · December 3rd, 2015, 05:19 AM

Originally Posted by superbonzo

the fact that you're changing just one bit does not mean you do it in O(1), you still need a time proportional to the number of max bits in a way or the other...

Please elaborate.

The total number of combinations to consider are 2^N where N is the number of bits. That's O(2^N). And for each combination, N bits are modified. That's O(N). If just one bit is modified the complexity instead becomes O(1). So total complexity is O(2^N * N) in the former case and O(2^N) in the latter. Also remember I'm using bitsets where many set operations take place in parallel.

**superbonzo** · December 3rd, 2015, 05:43 AM

Originally Posted by tiliavirga

If just one bit is modified the complexity instead becomes O(1)

so,

Code:

for( int c = 0; c < N; ++c )
{
    if(is_bit_on_at(c))
    {   
        flip_bit_at(c);
        break;
    }   
}

the pseudo-code above 'modifies' at most a single bit, are you claiming this is O(1) ??

"counting" always has at least linear complexity relative to the number of bits, so if the number of bits of the input sequence is arbitrary the runtime of the OP's algo will always grow at least as N*2^N in the worst case

**tiliavirga** · December 3rd, 2015, 06:51 AM

Originally Posted by superbonzo

the pseudo-code above 'modifies' at most a single bit, are you claiming this is O(1) ??

Your pseudo-code is O(N) because it searches for the bit to modify. If you know already which "c" to flip you can skip the loop and just do "flip_bit_at(c)". That's an O(1) operation.

so if the number of bits of the input sequence is arbitrary the runtime of the OP's algo will always grow at least as N*2^N in the worst case

The OP didn't request an arbitrary number of bits in the sequence. Far from that, in fact an upper limit of N=11 was indicated.

But certainly, my bitset approach will for all practical purposes be limited to at most N=64 since that's the widest native int commonly available. But it isn't much of a limitation really because then the number of combinations will be 2^64 which is intractable anyway.

On a more philosophical note. Just like there is no true randomness to be had on a digital computer you won't get anywhere near infinity either. So be careful with "arbitrary" when discussing practical programs. There will always be restrictions and limitations.

**tiliavirga** · December 3rd, 2015, 06:09 PM

I've implemented the two approaches I described in my post #4. The regular version is based on an integer sequence in natural order while the Gray code version instead uses the Gray order (just one bit changes between consecutive integers).

The regular version has O(2^N * N) complexity whereas the Gray version is O(2^N). The reason for this reduction is that the innermost loop could be removed thanks to the Gray property. Quite fascinating to watch in the output strings.

The difference in complexity probably won't matter much in this case. I just wanted "proof of concept" so to speak. Although Gray shows up in unexpected places I haven't seen it being used for complexity reduction like this before.

The code is written in low-level C++. It seemed natural considering the low-level nature of integer bitsets. I found all necessary information on Gray code bit-fiddling in Hacker's Delight by Warren (2'nd ed. p. 311-315).

Code:

void regular() {
	const std::string input = "b_0000000001110000011000000"; // input string
	const char SYMB[2] = {'0','1'};

	std::vector<int> positions;
	for (int i=0; i<int(input.size()); ++i) { // find positions of all '1' in input string
		if (input[i] == SYMB[1]) positions.push_back(i);
	}

	const int BITS = positions.size();
	const int COMBINATIONS = 1<<BITS;

	std::string output; // output string
	for (char c : input) output += (c!=SYMB[1]) ? c : SYMB[0]; // replace all '1' with '0'

	for (int i=1; i<COMBINATIONS-1; ++i) { // interpret sequence numbers as bitsets
		for (int j=0, bitset=i ; j<BITS; ++j, bitset>>=1) { // and modify output string accordingly
			int p = positions[j];
			char c = SYMB[bitset&1];
			output.replace(p,1,1,c);
		}
		std::cout << output << std::endl;
	}
}

Code:

void gray() {
	const std::string input = "b_0000000001110000011000000"; // input string
	const char SYMB[2] = {'0','1'};

	std::vector<int> positions;
	for (int i=0; i<int(input.size()); ++i) { // find positions of all '1' in input string
		if (input[i] == SYMB[1]) positions.push_back(i);
	}

	const int BITS = positions.size();
	const int COMBINATIONS = 1<<BITS;

	std::unordered_map<int,int> hashpos;
	for (int i=0; i<BITS; ++i) hashpos.insert({1<<i, positions[i]});

	std::string output; // output string
	for (char c : input) output += (c!=SYMB[1]) ? c : SYMB[0]; // replace all '1' with '0'

	for (int i=1; i<COMBINATIONS; ++i) { // interpret sequence numbers as bitsets
		int g = i ^ (i>>1); //  corresponding gray code
		if (g == COMBINATIONS-1) continue; // skip input string
		int b = i & ~(i-1); // changed bit
		int p = hashpos[b]; // position corresponding to changed bit
		char c = ((g&b) == b) ? SYMB[1] : SYMB[0]; // select '0' or '1' to write at position
		output.replace(p,1,1,c);
		std::cout << output << std::endl;
	}
}

**superbonzo** · December 4th, 2015, 04:31 AM

now, ignoring your very personal definition of bigO ...

are you claiming that the number of output.replace() calls grows as N*2^N in the regular case and as 2^N in the Gray case ? this is also false in general

note that your regular() code is artificially pessimized because there's no need of iterating over all BITS in the output string, you can just iterate up to the first zero bit. In this case it's easy to show that the total number of output.replace() calls scales as 2^N ( and not as N*2^N ) exactly as in the Gray case...

moreover, the (correctly written)regular case is probably more efficient, being much more cache friendly than the gray case, especially for bigger input strings ...

**tiliavirga** · December 4th, 2015, 10:08 AM

Originally Posted by superbonzo

note that your regular() code is artificially pessimized because there's no need of iterating over all BITS in the output string, you can just iterate up to the first zero bit.

Well, maybe I've missed something. Could you please explain what you mean by "iterate up to the first zero bit"?

**superbonzo** · December 4th, 2015, 10:45 AM

Originally Posted by tiliavirga

Well, maybe I've missed something. Could you please explain what you mean by "iterate up to the first zero bit"?

I mean, in order to increment a binary number in natural order you only need to search the first least significant zero bit, say, in 101010111 you check and flip only the first four bits on the right and the rest won't change.

so, let n(k) be the number of bit flips needed to count from 0 to 2^k-1 in natural order
now, given k+1 bits

00...0

to get to

01...1 -> you need n(k) flips

and to

10...0 -> n(k) + k + 1
11...1 -> n(k) + k + 1 + n(k)

so, we have the recurrence equation

n(k+1)=2n(k)+k+1

with n(2)=4;

given the ansatz n(k) = a2^k+bk+c, we have

2a2^k + bk + b + c = 2a2^k + (2b+1)k + 2c+1

hence

b==2b+1 --> b = -1
b+c=2c+1 --> c = -2
a4+b2+c=4 --> a = 2

that is

n(k) = 2^(k+1) - k - 2

that definitely does not scale as k*2^k

**tiliavirga** · December 4th, 2015, 04:36 PM

Originally Posted by superbonzo

I mean, in order to increment a binary number in natural order you only need to search the first least significant zero bit, say, in 101010111 you check and flip only the first four bits on the right and the rest won't change.

Okay, I'm convinced. Yours is the better algorithm here. What I can see it requires two bits of each string to be checked on average. My Gray based algorithm checks only one bit but is more complicated and has more overhead, still it has its lure.

Well, you can't win 'em all, can you.

Thread: All possible combinations problem

Thread Tools

Display

All possible combinations problem

Re: All possible combinations problem

Re: All possible combinations problem

Re: All possible combinations problem

Re: All possible combinations problem

Re: All possible combinations problem

Re: All possible combinations problem

Re: All possible combinations problem

Re: All possible combinations problem

Re: All possible combinations problem

Re: All possible combinations problem

Re: All possible combinations problem

Re: All possible combinations problem

Posting Permissions