CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 13 of 13
  1. #1
    Join Date
    May 2009
    Location
    Boston
    Posts
    364

    All possible combinations problem

    Hello,

    I have to create all possible combinations of some string data.

    The input data is string and looks like,
    Code:
    b_00000000011100000000000
    For a given input string, I need to generate all of the strings that could be considered subsets of the input. This just means any combination of position matching 1s up to n-1. The leading b_ is not relevant. For example,

    Code:
    for input string,
    b_00000000011100000000000
    
    output,
    b_00000000011000000000000
    b_00000000001100000000000
    b_00000000010100000000000
    b_00000000010000000000000
    b_00000000001000000000000
    b_00000000000100000000000
    This is fairly straightforward for a string with three 1s, but I have strings with up to 11 1s, so this needs to be done algorithmically. If there were 10 1s, I would need to generate all of the overlapping patterns of 1 to 10 1s.

    This is an example of 11 1s,
    b_11110011011100110000000

    One approach would start with each single 1 and then add each other 1 to make all of the pairs, then use the pairs to make the triplets, etc. Code for this kind of thing often looks like creating paths with a breadth first search, like adding neighbor vertex numbers. I have often found that there are clever algorithms for coding problems like this, so if anyone knows of something that might suit, that would be a big help.

    LMHmedchem

  2. #2
    Join Date
    Mar 2006
    Posts
    151

    Re: All possible combinations problem

    You just want all possible combinations of one's and zero's only in the positions where the input has a one, right?

    If so, think of the zero's and one's as simply being a binary number. If you count from zero all the way up to the input and skip digits which aren't set in the input, I think you'll get what you want. A recursive algorithm works well for doing this.

    Code:
    #include <vector>
    #include <iostream>
    
    using namespace std; 
    
    void OutputRemainingCounter(std::vector<char> vProcessed, std::vector<char> vUnprocessed)
    {
        vector<char>::iterator const iStart   = vUnprocessed.begin();
        vector<char>::iterator       iCurrent = vUnprocessed.begin();
        vector<char>::iterator const iEnd     = vUnprocessed.end  ();
    
        while (iCurrent < iEnd && *iCurrent != '1')
        {
            vProcessed.push_back(*iCurrent);
            ++iCurrent;
        }
    
        if (iCurrent < iEnd)
        {
            vector<char> vNewProcessed, vNewUnprocessed;
    
            vNewProcessed = vProcessed;
            vNewProcessed.push_back('0');
    
            vNewUnprocessed.insert(vNewUnprocessed.begin(), iCurrent + 1, iEnd);
    
            OutputRemainingCounter(vNewProcessed, vNewUnprocessed);
    
            *(vNewProcessed.end() - 1) = '1';
    
            OutputRemainingCounter(vNewProcessed, vNewUnprocessed);
        }
        else
        {
            vProcessed  .push_back('\0');
            vUnprocessed.push_back('\0');
            cout << &vProcessed[0] << &vUnprocessed[0] << endl;
        }
    }
    
    void main(void)
    {
        vector<char> v;
        v.push_back('b');
        v.push_back('_');
        v.push_back('0');
        v.push_back('0');
        v.push_back('0');
        v.push_back('1');
        v.push_back('1');
        v.push_back('1');
        v.push_back('0');
        v.push_back('0');
        v.push_back('1');
        v.push_back('1');
        v.push_back('0');
    
        OutputRemainingCounter(vector<char>(), v);
    }
    Of course, this would take a long time if a lot of the bits were set (because you'd effectively be counting from 0 to 2^N). I didn't bother to filter off the case where all bits are zero.

    Output:

    Code:
    b_000000000000
    b_000000000100
    b_000000001000
    b_000000001100
    b_000001000000
    b_000001000100
    b_000001001000
    b_000001001100
    b_000010000000
    b_000010000100
    b_000010001000
    b_000010001100
    b_000011000000
    b_000011000100
    b_000011001000
    b_000011001100
    b_000100000000
    b_000100000100
    b_000100001000
    b_000100001100
    b_000101000000
    b_000101000100
    b_000101001000
    b_000101001100
    b_000110000000
    b_000110000100
    b_000110001000
    b_000110001100
    b_000111000000
    b_000111000100
    b_000111001000
    b_000111001100
    An even better way might be to detect the number of set bits in your input and create a map of their positions, count through the number of set bits (contiguously) in a regular int, but every time you increment that counter, use the map you've created to plot the bits in the counter to bits in a separate int which correspond to the set bits in your input.

    Hope it helps,
    GeoRanger

  3. #3
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: All possible combinations problem

    Quote Originally Posted by LMHmedchem View Post
    Hello,

    I have to create all possible combinations of some string data.

    The input data is string and looks like,
    Code:
    b_00000000011100000000000
    For a given input string, I need to generate all of the strings that could be considered subsets of the input. This just means any combination of position matching 1s up to n-1. The leading b_ is not relevant. For example,

    Code:
    for input string,
    b_00000000011100000000000
    
    output,
    b_00000000011000000000000
    b_00000000001100000000000
    b_00000000010100000000000
    b_00000000010000000000000
    b_00000000001000000000000
    b_00000000000100000000000
    This is fairly straightforward for a string with three 1s, but I have strings with up to 11 1s, so this needs to be done algorithmically. If there were 10 1s, I would need to generate all of the overlapping patterns of 1 to 10 1s.

    This is an example of 11 1s,
    b_11110011011100110000000

    One approach would start with each single 1 and then add each other 1 to make all of the pairs, then use the pairs to make the triplets, etc. Code for this kind of thing often looks like creating paths with a breadth first search, like adding neighbor vertex numbers. I have often found that there are clever algorithms for coding problems like this, so if anyone knows of something that might suit, that would be a big help.

    LMHmedchem
    Another way yo see it is that you are basically just generating all combinations for N {1,0} pairs, where "N" is the amount of 1's in your original string. From there, your original string applies a "0" mask over it to make it pretty, but it is irrelevant your problem at hand.

    "generating all combinations for N {1,0} pairs" is basically counting in binary from 0 to n^^2 - 1. From there, you need to insert the 0's. I guess there are several ways to do the ladder. This simple algorithm is "L x N^^2" where L is the length of your string, and N is the ammount of 1's in your string. This is optimal as the answer occupies that much screen real-estate, so any solution must be at least that complex.

    Code:
    #include <iostream>
    #include <algorithm>
    
    using namespace std;
    
    int main(){
      auto input = string{"b_00000000011100000000000"};
      auto one_count = count(input.begin(), input.end(), '1');
    
      auto maximum = 1 << one_count;
      for (auto i = 1; i < maximum - 1; ++i) {
        auto max_mask = maximum;
        cout << "b_";
        for (auto it = input.begin() + 2; it != input.end(); ++it) {
          if (*it == '0') {
            cout << '0';
          } else {
            max_mask >>= 1;
            cout << ((max_mask & i) != 0);
          }
        }
        cout << endl;
      }
    }
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  4. #4
    Join Date
    Jun 2015
    Posts
    208

    Re: All possible combinations problem

    Quote Originally Posted by LMHmedchem View Post
    I have often found that there are clever algorithms for coding problems like this, so if anyone knows of something that might suit, that would be a big help.
    Your problem corresponds to finding all possible subsets of a set (also called the powerset). There are 2^N such subsets where N is the number of set members.

    One common approach is to consider the bit representation of the number sequence from 0 to 2^N - 1. For example if N is 3 the numbers are 0, 1, 2, 3, 4, 5, 6, 7 and the corresponding bit representations are 000, 001, 010, 011, 100, 101, 110, 111. These are also called bitsets. In your case 111 would correspond to any input string with three 1's and the numbers 1 to 6 would encode for all possible variations of that input string (0 is discarded and 7 is the input string itself).

    This solves the combinatorial part. For an input string with N 1's all numbers from 1 up to 2^N - 2 are looped through and for each number the input string is modified in N places according to the number bitset. This gives a complexity of O(2^N * N).

    I think it's possible to reduce this to O(2^N). The idea is to use the Gray ordering rather than the natural ordering of numbers. In Gray ordering the bit representation sequence for N=3 becomes 000, 001, 011, 010, 110, 111, 101, 100. What's interesting here is that just one bit changes from one bit representation to the next. Utilized in your case that would mean that instead of N bits just one bit must be modified in the input string. That results in O(2^N * 1) which is O(2^N).

    So since there are 2^N variations the lowest possible complexity is O(2^N) and that's what you get with an iterative solution and Gray ordering. I guess that's also what a recursive solution should yield but it comes at the expense of ... well, recursion .

    I'll post some code later.
    Last edited by tiliavirga; December 3rd, 2015 at 03:44 AM.

  5. #5
    Join Date
    Oct 2008
    Posts
    1,456

    Re: All possible combinations problem

    Quote Originally Posted by tiliavirga View Post
    What's interesting here is that just one bit changes from one bit representation to the next. Utilized in your case that would mean that instead of N bits just one bit must be modified in the input string. That results in O(2^N * 1) which is O(2^N).
    the fact that you're changing just one bit does not mean you do it in O(1), you still need a time proportional to the number of max bits in a way or the other...

  6. #6
    Join Date
    Jun 2015
    Posts
    208

    Re: All possible combinations problem

    Quote Originally Posted by superbonzo View Post
    the fact that you're changing just one bit does not mean you do it in O(1), you still need a time proportional to the number of max bits in a way or the other...
    Please elaborate.

    The total number of combinations to consider are 2^N where N is the number of bits. That's O(2^N). And for each combination, N bits are modified. That's O(N). If just one bit is modified the complexity instead becomes O(1). So total complexity is O(2^N * N) in the former case and O(2^N) in the latter. Also remember I'm using bitsets where many set operations take place in parallel.
    Last edited by tiliavirga; December 3rd, 2015 at 05:28 AM.

  7. #7
    Join Date
    Oct 2008
    Posts
    1,456

    Re: All possible combinations problem

    Quote Originally Posted by tiliavirga View Post
    If just one bit is modified the complexity instead becomes O(1)
    so,

    Code:
    for( int c = 0; c < N; ++c )
    {
        if(is_bit_on_at(c))
        {   
            flip_bit_at(c);
            break;
        }   
    }
    the pseudo-code above 'modifies' at most a single bit, are you claiming this is O(1) ??

    "counting" always has at least linear complexity relative to the number of bits, so if the number of bits of the input sequence is arbitrary the runtime of the OP's algo will always grow at least as N*2^N in the worst case

  8. #8
    Join Date
    Jun 2015
    Posts
    208

    Re: All possible combinations problem

    Quote Originally Posted by superbonzo View Post
    the pseudo-code above 'modifies' at most a single bit, are you claiming this is O(1) ??
    Your pseudo-code is O(N) because it searches for the bit to modify. If you know already which "c" to flip you can skip the loop and just do "flip_bit_at(c)". That's an O(1) operation.

    so if the number of bits of the input sequence is arbitrary the runtime of the OP's algo will always grow at least as N*2^N in the worst case
    The OP didn't request an arbitrary number of bits in the sequence. Far from that, in fact an upper limit of N=11 was indicated.

    But certainly, my bitset approach will for all practical purposes be limited to at most N=64 since that's the widest native int commonly available. But it isn't much of a limitation really because then the number of combinations will be 2^64 which is intractable anyway.

    On a more philosophical note. Just like there is no true randomness to be had on a digital computer you won't get anywhere near infinity either. So be careful with "arbitrary" when discussing practical programs. There will always be restrictions and limitations.
    Last edited by tiliavirga; December 4th, 2015 at 02:35 AM.

  9. #9
    Join Date
    Jun 2015
    Posts
    208

    Re: All possible combinations problem

    I've implemented the two approaches I described in my post #4. The regular version is based on an integer sequence in natural order while the Gray code version instead uses the Gray order (just one bit changes between consecutive integers).

    The regular version has O(2^N * N) complexity whereas the Gray version is O(2^N). The reason for this reduction is that the innermost loop could be removed thanks to the Gray property. Quite fascinating to watch in the output strings.

    The difference in complexity probably won't matter much in this case. I just wanted "proof of concept" so to speak. Although Gray shows up in unexpected places I haven't seen it being used for complexity reduction like this before.

    The code is written in low-level C++. It seemed natural considering the low-level nature of integer bitsets. I found all necessary information on Gray code bit-fiddling in Hacker's Delight by Warren (2'nd ed. p. 311-315).

    Code:
    void regular() {
    	const std::string input = "b_0000000001110000011000000"; // input string
    	const char SYMB[2] = {'0','1'};
    
    	std::vector<int> positions;
    	for (int i=0; i<int(input.size()); ++i) { // find positions of all '1' in input string
    		if (input[i] == SYMB[1]) positions.push_back(i);
    	}
    
    	const int BITS = positions.size();
    	const int COMBINATIONS = 1<<BITS;
    
    	std::string output; // output string
    	for (char c : input) output += (c!=SYMB[1]) ? c : SYMB[0]; // replace all '1' with '0'
    
    	for (int i=1; i<COMBINATIONS-1; ++i) { // interpret sequence numbers as bitsets
    		for (int j=0, bitset=i ; j<BITS; ++j, bitset>>=1) { // and modify output string accordingly
    			int p = positions[j];
    			char c = SYMB[bitset&1];
    			output.replace(p,1,1,c);
    		}
    		std::cout << output << std::endl;
    	}
    }
    Code:
    void gray() {
    	const std::string input = "b_0000000001110000011000000"; // input string
    	const char SYMB[2] = {'0','1'};
    
    	std::vector<int> positions;
    	for (int i=0; i<int(input.size()); ++i) { // find positions of all '1' in input string
    		if (input[i] == SYMB[1]) positions.push_back(i);
    	}
    
    	const int BITS = positions.size();
    	const int COMBINATIONS = 1<<BITS;
    
    	std::unordered_map<int,int> hashpos;
    	for (int i=0; i<BITS; ++i) hashpos.insert({1<<i, positions[i]});
    
    	std::string output; // output string
    	for (char c : input) output += (c!=SYMB[1]) ? c : SYMB[0]; // replace all '1' with '0'
    
    	for (int i=1; i<COMBINATIONS; ++i) { // interpret sequence numbers as bitsets
    		int g = i ^ (i>>1); //  corresponding gray code
    		if (g == COMBINATIONS-1) continue; // skip input string
    		int b = i & ~(i-1); // changed bit
    		int p = hashpos[b]; // position corresponding to changed bit
    		char c = ((g&b) == b) ? SYMB[1] : SYMB[0]; // select '0' or '1' to write at position
    		output.replace(p,1,1,c);
    		std::cout << output << std::endl;
    	}
    }
    Last edited by tiliavirga; December 4th, 2015 at 04:11 AM.

  10. #10
    Join Date
    Oct 2008
    Posts
    1,456

    Re: All possible combinations problem

    now, ignoring your very personal definition of bigO ...

    are you claiming that the number of output.replace() calls grows as N*2^N in the regular case and as 2^N in the Gray case ? this is also false in general

    note that your regular() code is artificially pessimized because there's no need of iterating over all BITS in the output string, you can just iterate up to the first zero bit. In this case it's easy to show that the total number of output.replace() calls scales as 2^N ( and not as N*2^N ) exactly as in the Gray case...

    moreover, the (correctly written)regular case is probably more efficient, being much more cache friendly than the gray case, especially for bigger input strings ...
    Last edited by superbonzo; December 4th, 2015 at 04:55 AM. Reason: added moreover ...

  11. #11
    Join Date
    Jun 2015
    Posts
    208

    Re: All possible combinations problem

    Quote Originally Posted by superbonzo View Post
    note that your regular() code is artificially pessimized because there's no need of iterating over all BITS in the output string, you can just iterate up to the first zero bit.
    Well, maybe I've missed something. Could you please explain what you mean by "iterate up to the first zero bit"?

  12. #12
    Join Date
    Oct 2008
    Posts
    1,456

    Re: All possible combinations problem

    Quote Originally Posted by tiliavirga View Post
    Well, maybe I've missed something. Could you please explain what you mean by "iterate up to the first zero bit"?
    I mean, in order to increment a binary number in natural order you only need to search the first least significant zero bit, say, in 101010111 you check and flip only the first four bits on the right and the rest won't change.

    so, let n(k) be the number of bit flips needed to count from 0 to 2^k-1 in natural order
    now, given k+1 bits

    00...0

    to get to

    01...1 -> you need n(k) flips

    and to

    10...0 -> n(k) + k + 1
    11...1 -> n(k) + k + 1 + n(k)

    so, we have the recurrence equation

    n(k+1)=2n(k)+k+1

    with n(2)=4;

    given the ansatz n(k) = a2^k+bk+c, we have

    2a2^k + bk + b + c = 2a2^k + (2b+1)k + 2c+1

    hence

    b==2b+1 --> b = -1
    b+c=2c+1 --> c = -2
    a4+b2+c=4 --> a = 2

    that is

    n(k) = 2^(k+1) - k - 2

    that definitely does not scale as k*2^k

  13. #13
    Join Date
    Jun 2015
    Posts
    208

    Re: All possible combinations problem

    Quote Originally Posted by superbonzo View Post
    I mean, in order to increment a binary number in natural order you only need to search the first least significant zero bit, say, in 101010111 you check and flip only the first four bits on the right and the rest won't change.
    Okay, I'm convinced. Yours is the better algorithm here. What I can see it requires two bits of each string to be checked on average. My Gray based algorithm checks only one bit but is more complicated and has more overhead, still it has its lure.

    Well, you can't win 'em all, can you.
    Last edited by tiliavirga; December 5th, 2015 at 05:23 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured