General Programming Question - Self Enumerating Pangrams
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 10 of 10

Thread: General Programming Question - Self Enumerating Pangrams

  1. #1
    Join Date
    Apr 2009
    Posts
    38

    General Programming Question - Self Enumerating Pangrams

    I'm trying to create a self enumerating pangram, here is an example of one:

    This first pangram has five a's, one b, one c, two d's, twenty-nine e's, six f's, four g's, eight h's, twelve i's, one j, one k, three l's, two m's, nineteen n's, twelve o's, two p's, one q, eight r's, twenty-six s's, twenty t's, three u's, five v's, nine w's, three x's, four y's, and one z.

    I've tried two different approaches.

    The first approach was to generate a sentence with random numbers. I would compare the randomly generated numbers to the actual number of occurrences of each letter, update the sentence and repeat. This method quickly finds potential pangrams with an error of between 6 - 8 numbers, but there doesn't seem to be much more progress after that.

    The second approach was to create a "gene pool" of random potential pangrams. The fittest pangrams can reproduce (by simulating crossover), and there are random mutations, and new individuals are created and destroyed on a random basis. This method takes longer, but has produced potential pangrams with an error of between 1 - 4 numbers.

    Are there any other approaches which I haven't thought of? Which of the above two approaches has the most potential of working? I'm nearly going insane trying to figure this out!

    Any help will be really appreciated!

    Thank You

  2. #2
    Join Date
    May 2009
    Posts
    2,413

    Re: General Programming Question - Self Enumerating Pangrams

    Quote Originally Posted by SamstaUK View Post
    Are there any other approaches which I haven't thought of?
    One approach often used in these high-dimensional problems is Monte Carlo.

    It's quite similar to the genetic approach. The pangram is slightly modified in each iteration and if the modification resulted in a better pangram it's accepted. But, and this is the defining feature of Monte Carlo, even if the modified pangram is worse than the existing it's acceped with a certain (small) probability. This "walking about in pangram space" continues until a correct panagram is found (or the computer is switched off )

    There are a few open questions still to think about. How should the pangram be "modified"? And how does one know whether a pangram is "better or worse" than another pangram?

    I cannot guarantee Monte Carlo will be better than what you already tried but it's a well established standard method.
    Last edited by nuzzle; July 4th, 2012 at 04:00 PM.

  3. #3
    Join Date
    Apr 2009
    Posts
    38

    Re: General Programming Question - Self Enumerating Pangrams

    Thank you for the monte carlo suggestion, I'll have a read about it

    I'm using a sort of hybrid technique, I'm getting *really* close now, only 1 number is off, but still no solution.

    What I'm doing is choosing a random number, and setting every letter to that number. Then I update the pangram with the actual count. I keep updating the pangram for 5 seconds, if no progress is made, then it is stuck in a loop, so I mutate it 15000 to 1000000 times, updating the pangram between mutations. (I choose this figure before running the program). If no improvement is made from the mutations, I revert back to the previous best and start mutating again.

    I'm trying to find the right balance between mutations and updates. If I update too often I get stuck in a loop, if I mutate too much I might miss a potential solution.

    I've turned my spare room into an oven, running this on two poweredge servers :P

    EDIT: Solution Found!

    It turns out that the string you use to start your pangram determines whether it is solvable or not. I changed the base string, and within minutes had a solution

    After much tribulation, Sam Kennedy made this pangram which contains: nine a's, two b's, four c's, four d's, twenty nine e's, nine f's, two g's, seven h's, fifteen i's, one j, two k's, two l's, five m's, twenty nine n's, sixteen o's, two p's, one q, ten r's, thirty one s's, twenty four t's, seven u's, four v's, ten w's, three x's, six y's, and one z.
    This is a really interesting exercise if anyone is ever bored and wants something to do!
    Last edited by SamstaUK; July 4th, 2012 at 06:58 PM.

  4. #4
    Join Date
    May 2009
    Posts
    2,413

    Re: General Programming Question - Self Enumerating Pangrams

    Quote Originally Posted by SamstaUK View Post
    This is a really interesting exercise if anyone is ever bored and wants something to do!
    I'll have that in mind and then I'll try the Monte Carlo approach.

    But there's another approach I'd be eager to investigate. It's when you use one pangram to generate the next, and the next and so on recursively. It would resemble a random generator for pangrams. With luck the process approaches a correct pangram and you know you're there when two consecutive pangrams are equal. It's because a correct pangram generates itself. But the process may also end up cycling, and it definately will if no correct pangram exists.

    Regarding the tuning of your genetic algorithm. There will be the same situation in Monte Carlo where you accept a worse pangram with a certain fixed probability. If it's set too low the simulation may get stuck locally. If it's too high a correct pangram may be overlooked.

    Finally the question of whether a correct pangram exists at all. I have a feeling it depends on the fixed part of the pangram. The variable part will have to adapt to that and that may not be possible. But even without a fixed part a pangram may not have a solution (it should be language dependent I guess because numbers have different letter representations) so the fixed part must be decisive for the question of existence.
    Last edited by nuzzle; July 5th, 2012 at 05:37 AM.

  5. #5
    Join Date
    Apr 2013
    Posts
    1

    Re: General Programming Question - Self Enumerating Pangrams

    hmmm... I just tried solving this same problem, I first used monte carlo but that seemed too slow, I then tried a mixed method of monte carlo and then updating the numbers that the pangram has iteratively (as I think OP had in method 1). however I haven't been able to make much progress, I couldn't see your code could I? do let me know, I've spent all day on this and now I'll be really annoyed if I can't get a working solution. (Hope you're still on this forum/still have the code a year later, fingers crossed)

  6. #6
    Join Date
    May 2009
    Posts
    2,413

    Re: General Programming Question - Self Enumerating Pangrams

    Quote Originally Posted by crashandburn4 View Post
    I first used monte carlo but that seemed too slow,
    Well, this looks like the archetypical needle-in-a-haystack problem. If it's NP-complete which I suspect then it's bound to be slow regardless of which solution method you apply. You're not even guaranteed to ever find an exact solution regardless of how long you try. And if you do it can be ascribed mostly to luck.

    I've tried a Monte Carlo strategy. It finds a solution in both cases the OP tried, also the one he abandoned as lacking a solution. But we're talking many millions, almost billions, of iterations and that takes time. As always with Monte Carlo it's a delicate balance between sticking around in areas with high chance of a solution and at the same time avoid getting stuck. It's an art rather than a science really. Even changing seed in the random generator may be the difference between finding a solution or not. That's why it's hard to know whether an improved algorithm was an improvement or if it just was the random walk that took a different path and happened to strike gold this time.

    I've introduced a data structure called an LFV, a letter frequency vector. It's simply an array with one entry for each letter, each entry holding the frequency of that letter (how often it occurs). In this case there are 26 entries, one for each English letter with frequencies in the 0 to 99 range.

    Now a Pangram can be turned into an LFV, that's obvious, any text can. But in this case the opposite is true too. An LFV holding the variable part of a Pangram can be used to generate the whole Pangram text. Say the frequency for the letter 't' is 42, then that would correspond to the text "fourty-two t's" in the Pangram.

    Say we have an LFV called the Variate and a function called the PangramFunction which takes an LFV and turns it into a Pangram and then returns an LFV of that Pangram. Then the problem of finding a Self-Enumerating Pangram can be expressed with this equality,

    Variate == PangramFunction(Variate)

    It's simply a question of finding the Variate that transforms into itself when passing the Pangram function. The Variate has a total of 26^100 combinations so finding the solution Variate is not exactly easy picking to say the least.

    To determine how close a Variate is to a solution the squared distance between Variate and PangramFunction(Variate) is used. When this distance is 0 they're equal and that's the ultimate goal.

    Some experimentation shows that a good strategy is to make very small frequency changes to a few letters of the Variate in each iteration. If this results in a lower squared distance the modified Variate is accepted. To prevent this process from getting stuck occasionally a Variate with higher squared distance is accepted. But that's standard procedure for Monte Carlo.
    Last edited by nuzzle; April 16th, 2013 at 01:17 AM.

  7. #7
    Join Date
    May 2009
    Posts
    2,413

    Re: General Programming Question - Self Enumerating Pangrams

    This is the result of some "recreational programming" this weekend.

    It's written using VS2012 Express for Windows Desktop. It's started like this,

    SEP sep;
    sep.simulation();

    Unfortunately I missed there should be an "and" before the last letter in a Pangram text. I didn't want to modify the program at that point so the fix is instead to add an "and" to the constant part of the Pangram. Interestingly this has a huge impact on solution time. The "After much tribulation, Sam Kennedy made this pangram which contains: " Pangram took 62 million iterations without the "and " but 892 million with it. This really demonstrates the true probablistic nature of Monte Carlo simulations.

    This program has found a solution in under 1 billion iterations to all Pangrams I've tested. But it's still only an approximate algorithm and sooner or later this promising track record will be broken.

    Code:
    #include <cctype>
    #include <random>
    
    inline int toOrd(char ch) { // order number of char
    	return std::tolower(ch) - 'a';
    }
    inline char toChar(int order) { // char corresponding to order number
    	return static_cast<char>('a' + order);
    }
    
    double rndReal() { // a random double between 0 and 1
    	static std::default_random_engine engine;
    	return std::uniform_real_distribution<double>(0.0, 1.0)(engine);
    }
    int rndInt(int range) { // a random int between 0 and range-1
    	static std::default_random_engine engine;
    	return std::uniform_int_distribution<int>(0, range-1)(engine);
    }
    
    template <int N>
    class LFV {	// Letter Frequency Vector 
    public:
    	LFV() {
    		for (int i=0; i<N; ++i) v[i] = 0;
    	}
    	int size() const {return N;}
    
    	const int& operator[](const int i) const {return v[i];}
    	int& operator[](const int i) {return v[i];}
    
    	LFV<N>& operator+=(const LFV<N>& r) {
    		for (int i=0; i<N; ++i) v[i] += r.v[i];
    		return *this;
    	}
    	friend LFV<N> operator+(LFV<N> l, const LFV<N>& r) {
    		l += r;
    		return l;
    	}
    
    	LFV<N>& operator-=(const LFV<N>& r) {
    		for (int i=0; i<N; ++i) v[i] -= r.v[i];
    		return *this;
    	}
    	friend LFV<N> operator-(LFV<N> l, const LFV<N>& r) {
    		l -= r;
    		return l;
    	}
    
    	friend bool operator==(const LFV<N>& l, const LFV<N>& r) {
    		for (int i=0; i<N; ++i) {
    			if (l.v[i] != r.v[i]) return false;
    		}
    		return true;
    	}
    
    private:
    	int v[N];
    };
    
    template <int N>
    std::string toString(const LFV<N>& lfv) { // print version of LFV
    	std::string str = "<";
    	for (int i=0; i<N; ++i) {
    		if (i>0) str += ",";
    		str += std::to_string(lfv[i]);
    	}
    	return str + ">";
    }
    
    template <int N>
    LFV<N> toLFV(const std::string& str) { // generate LFV from string
    	LFV<N> lfv;
    	for (char ch : str) {
    		if (std::isalpha(ch)) lfv[toOrd(ch)]++; // only letters count
    	}
    	return lfv;
    }
    
    template <int N>
    inline int sqrDiff(const LFV<N>& lfv1, const LFV<N>& lfv2) { // squared distance between LFV's
    	int sum = 0;
    	for (int i=0; i<N; ++i) {
    		int diff = lfv1[i] - lfv2[i];
    		sum += diff*diff;
    	}
    	return sum;
    }
    
    class SEP { // Self Enumerating Pangram
    public:
    	SEP() {
    			// constant part of pangram (select one or make your own)
    		CONSTPART = "After much tribulation, Sam Kennedy made this pangram which contains: ";
    //		CONSTPART = "After much tribulation, Sam Kennedy made this pangram which contains: (and) ";
    //		CONSTPART = "This first pangram has ";
    //		CONSTPART = "This first pangram has (and) ";
    //		CONSTPART = "Now let's see if Nuzzle can find a pangram: ";
    //		CONSTPART = "Now let's see if Nuzzle can find a pangram: (and) ";
    
    			// generate text versions of letter frequencies, and their LFV counterparts
    		std::array<std::string,DECBASE> singles = {"","one","two","three","four","five","six","seven","eight","nine"};
    		std::array<std::string,DECBASE> twenties = {"ten","eleven","twelve","thirteen","fourteen","fifteen","sixteen","seventeen","eighten","nineteen"};
    		std::array<std::string,DECBASE> decades = {"","","twenty","thirty","fourty","fifty","sixty","seventy","eighty","ninty"};
    		for (int i=0; i<MAXNUM; ++i) {
    			std::string str;
    			if (i<DECBASE) str = singles[i%DECBASE];
    			else if (i<2*DECBASE) str = twenties[i%DECBASE];
    			else {
    				str = decades[i/DECBASE];
    				if (i%DECBASE > 0) str += "-" + singles[i%DECBASE];
    			}
    //			std::cout << str << std::endl;
    			hundrednum[i] = str;
    			hundrednumLFV[i] = toLFV<LETTERS>(str);
    		}
    	}
    
    	void simulation() { // search for self-enumerating pangram using Monte Carlo
    		int const MAXTRIES = 2000000000; // 2 billion (close to max for an int)
    
    		Lfv const constpart = toLFV<LETTERS>(CONSTPART);
    		Lfv variate = constpart;
    		Lfv pangram = pangramLFV(variate, constpart);
    
    		int diff = sqrDiff(variate, pangram);
    		int minDiff = diff;
    		int tries=0;
    		while (tries++ < MAXTRIES && diff > 0) { // until search exhausted or pangram found
    
    			Lfv newVariate = variate;	// trial variate
    			for (int letter=0; letter<LETTERS; ++letter) { // all letters in variate
    				switch (rndInt(16)) { // magic number controlling change probabilities
    				        case 0: { // letter frequency is increases by one
    						int frequency = newVariate[letter] + 1;
    						newVariate[letter] = (frequency < MAXNUM) ? frequency : MAXNUM-1;
    						break;
    					}
    					case 1: { // decreased by one
    						int frequency = newVariate[letter] - 1;
    						newVariate[letter] = (frequency >= 0) ? frequency :  0;
    						break;
    					}
    					default:; // no change
    				}
    			}
    
    			pangram = pangramLFV(newVariate, constpart); // pangram associated with trial variate
    			int newDiff = sqrDiff(newVariate, pangram); // difference between trial variate and pangram
    
    			if (newDiff < diff || rndReal() < 0.00001) { // accept lower difference, and higher at random
    				diff = newDiff;
    				variate = newVariate;
    				if (diff < minDiff) { // new minimum difference
    					minDiff = diff;
    					std::cout << "* diff=" << minDiff << ", tries=" << tries << std::endl;
    					std::cout << "variate:" << toString(variate) << std::endl;
    					std::cout << "pangram:" << toString(pangram) << std::endl;
    				}
    			}
    		}
    		if (diff == 0) {
    			std::cout << "* a hit!" << std::endl;
    			std::cout << pangramTxt(variate, CONSTPART) << std::endl;
    		}
    		std::cout << "* ready" << std::endl;
    	}
    
    private:
    	static const int LETTERS = 26;	// number of letters in English
    	typedef LFV<LETTERS> Lfv;		// LFV type specialized for letter range
    
    	static int const DECBASE = 10;	// decimal number base
    	static int const MAXNUM = 100;	// maximum letter frequency
    	std::string CONSTPART;			//constant part of pangram
    
    	std::array<std::string,MAXNUM> hundrednum;	// letter frequences, text version
    	std::array<Lfv,MAXNUM> hundrednumLFV;		// letter frequencies, LFV version
    
    		// text version of pangram (consisting of a variable and a constant part)
    	std::string pangramTxt(const Lfv& varpart, const std::string& constpart) {
    		std::string str = constpart;
    		bool b = false;
    		for (int letter=0; letter<LETTERS; ++letter) {
    			int frequency = varpart[letter];
    			if (frequency > 0) {
    				if (b) str += ", ";
    				str += hundrednum[frequency] + " " + toChar(letter);
    				if (frequency>1) str += "'s";	// plural
    				b = true;
    			}
    		}
    		return str + ".";
    	}
    
    		// LFV of specific letter and its frequency
    	Lfv generateLFV(int letter, int frequency) {
    		Lfv lfv;
    		if (frequency > 0) {
    			lfv = hundrednumLFV[frequency];
    			lfv[letter]++;
    			if (frequency > 1) lfv[toOrd('s')]++;
    		}
    		return lfv;	
    	}
    
    		// LFV version of pangram (consisting of a varable and a constant part)
    	Lfv pangramLFV(const Lfv& varpart, const Lfv& constpart) {
    		Lfv lfv = constpart;
    		for (int letter=0; letter<LETTERS; ++letter) {
    			int frequency = varpart[letter];
    			if (frequency>0) lfv += generateLFV(letter,frequency);
    		}
    		return lfv;
    	}
    };
    Last edited by nuzzle; April 15th, 2013 at 01:52 PM.

  8. #8
    Join Date
    May 2009
    Posts
    2,413

    Re: General Programming Question - Self Enumerating Pangrams

    Quote Originally Posted by nuzzle View Post
    If it's NP-complete which I suspect
    It is indeed. In fact it's quite easily reducible to SAT (Satisfiablility) the first of Karp's 21 original NP complete problems.

    http://en.wikipedia.org/wiki/Karp&#39;s_...plete_problems

    The equality in my previous post #6 can be implemented using a logic network consisting of Boolean high-level elements such a ROM memories, adders and comparators which all in their turn can be implemented with simple NAND and NOR gates. This means the logic network constitutes a big Boolean expression. Presented with a Variate input (the 26 letter frequencies in binary form) this network will churn out a single Boolean output telling whether the equality is true or false.

    Finding which inputs to a Boolean expression will yield true is a Satisfiability problem. It's known to be NP complete so the Self-Enumerating Pangram problem should be NP complete too. This means it's a hard problem but also that efficient solution strategies can be sought among general SAT-solvers. But that's a project for another weekend.

    By the way if you're interested in the foundation of computing I can recommend The Golden Ticket by Lance Fortnow. It's a popular science book but it's written by an insider (and not some journalist who's trying to explain what he doesn't understand). It's by far the most accessible still deep book in this category I've ever seen.
    Last edited by nuzzle; April 23rd, 2013 at 10:43 AM.

  9. #9
    Join Date
    Oct 2008
    Posts
    1,102

    Re: General Programming Question - Self Enumerating Pangrams

    Quote Originally Posted by nuzzle View Post
    It's known to be NP complete so the Self-Enumerating Pangram problem is NP complete too.
    given an NP-complete problem A and a problem B, in order to prove that B is at least NP-complete you should solve A in terms of B; if I read your post correctly, you did the opposit, so you didn't prove that the pangram problem is NP-complete.

  10. #10
    Join Date
    May 2009
    Posts
    2,413

    Re: General Programming Question - Self Enumerating Pangrams

    Quote Originally Posted by superbonzo View Post
    given an NP-complete problem A and a problem B, in order to prove that B is at least NP-complete you should solve A in terms of B; if I read your post correctly, you did the opposit, so you didn't prove that the pangram problem is NP-complete.
    It's not a formal proof. I just presented the evidence that finally convinced me of NP completness. But a proof shouldn't be that hard. I promise to clearly indicate if I attempt one.

    But more importantly, the fact that the Self-Enumerating Pangram problem can be easily reformulated as a SAT problem (specifically CIRCUIT-SAT) means it can be attacked with SAT-solvers.

    And if you feel you could easily prove something in one way or another please feel free to do so. Staying Mr. Besserwisser can't be that satisfying.
    Last edited by nuzzle; April 24th, 2013 at 08:11 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Azure Activities Information Page

Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center