General Programming Question - Self Enumerating Pangrams

I'm trying to create a self enumerating pangram, here is an example of one:

Quote:

This first pangram has five a's, one b, one c, two d's, twenty-nine e's, six f's, four g's, eight h's, twelve i's, one j, one k, three l's, two m's, nineteen n's, twelve o's, two p's, one q, eight r's, twenty-six s's, twenty t's, three u's, five v's, nine w's, three x's, four y's, and one z.

I've tried two different approaches.

The first approach was to generate a sentence with random numbers. I would compare the randomly generated numbers to the actual number of occurrences of each letter, update the sentence and repeat. This method quickly finds potential pangrams with an error of between 6 - 8 numbers, but there doesn't seem to be much more progress after that.

The second approach was to create a "gene pool" of random potential pangrams. The fittest pangrams can reproduce (by simulating crossover), and there are random mutations, and new individuals are created and destroyed on a random basis. This method takes longer, but has produced potential pangrams with an error of between 1 - 4 numbers.

Are there any other approaches which I haven't thought of? Which of the above two approaches has the most potential of working? I'm nearly going insane trying to figure this out!

Any help will be really appreciated!

Thank You :)

Re: General Programming Question - Self Enumerating Pangrams

Quote:

Originally Posted by

**SamstaUK**
Are there any other approaches which I haven't thought of?

One approach often used in these high-dimensional problems is Monte Carlo.

It's quite similar to the genetic approach. The pangram is slightly modified in each iteration and if the modification resulted in a better pangram it's accepted. But, and this is the defining feature of Monte Carlo, even if the modified pangram is worse than the existing it's acceped with a certain (small) probability. This "walking about in pangram space" continues until a correct panagram is found (or the computer is switched off :))

There are a few open questions still to think about. How should the pangram be "modified"? And how does one know whether a pangram is "better or worse" than another pangram?

I cannot guarantee Monte Carlo will be better than what you already tried but it's a well established standard method.

Re: General Programming Question - Self Enumerating Pangrams

Thank you for the monte carlo suggestion, I'll have a read about it :)

I'm using a sort of hybrid technique, I'm getting *really* close now, only 1 number is off, but still no solution.

What I'm doing is choosing a random number, and setting every letter to that number. Then I update the pangram with the actual count. I keep updating the pangram for 5 seconds, if no progress is made, then it is stuck in a loop, so I mutate it 15000 to 1000000 times, updating the pangram between mutations. (I choose this figure before running the program). If no improvement is made from the mutations, I revert back to the previous best and start mutating again.

I'm trying to find the right balance between mutations and updates. If I update too often I get stuck in a loop, if I mutate too much I might miss a potential solution.

I've turned my spare room into an oven, running this on two poweredge servers :P

EDIT: Solution Found!

It turns out that the string you use to start your pangram determines whether it is solvable or not. I changed the base string, and within minutes had a solution :)

Quote:

After much tribulation, Sam Kennedy made this pangram which contains: nine a's, two b's, four c's, four d's, twenty nine e's, nine f's, two g's, seven h's, fifteen i's, one j, two k's, two l's, five m's, twenty nine n's, sixteen o's, two p's, one q, ten r's, thirty one s's, twenty four t's, seven u's, four v's, ten w's, three x's, six y's, and one z.

This is a really interesting exercise if anyone is ever bored and wants something to do!

Re: General Programming Question - Self Enumerating Pangrams

Quote:

Originally Posted by

**SamstaUK**
This is a really interesting exercise if anyone is ever bored and wants something to do!

I'll have that in mind :) and then I'll try the Monte Carlo approach.

But there's another approach I'd be eager to investigate. It's when you use one pangram to generate the next, and the next and so on recursively. It would resemble a random generator for pangrams. With luck the process approaches a correct pangram and you know you're there when two consecutive pangrams are equal. It's because a correct pangram generates itself. But the process may also end up cycling, and it definately will if no correct pangram exists.

Regarding the tuning of your genetic algorithm. There will be the same situation in Monte Carlo where you accept a worse pangram with a certain fixed probability. If it's set too low the simulation may get stuck locally. If it's too high a correct pangram may be overlooked.

Finally the question of whether a correct pangram exists at all. I have a feeling it depends on the fixed part of the pangram. The variable part will have to adapt to that and that may not be possible. But even without a fixed part a pangram may not have a solution (it should be language dependent I guess because numbers have different letter representations) so the fixed part must be decisive for the question of existence.

Re: General Programming Question - Self Enumerating Pangrams

hmmm... I just tried solving this same problem, I first used monte carlo but that seemed too slow, I then tried a mixed method of monte carlo and then updating the numbers that the pangram has iteratively (as I think OP had in method 1). however I haven't been able to make much progress, I couldn't see your code could I? do let me know, I've spent all day on this and now I'll be really annoyed if I can't get a working solution. (Hope you're still on this forum/still have the code a year later, fingers crossed)

Re: General Programming Question - Self Enumerating Pangrams

Quote:

Originally Posted by

**crashandburn4**
I first used monte carlo but that seemed too slow,

Well, this looks like the archetypical needle-in-a-haystack problem. If it's NP-complete which I suspect then it's bound to be slow regardless of which solution method you apply. You're not even guaranteed to ever find an exact solution regardless of how long you try. And if you do it can be ascribed mostly to luck.

I've tried a Monte Carlo strategy. It finds a solution in both cases the OP tried, also the one he abandoned as lacking a solution. But we're talking many millions, almost billions, of iterations and that takes time. As always with Monte Carlo it's a delicate balance between sticking around in areas with high chance of a solution and at the same time avoid getting stuck. It's an art rather than a science really. Even changing seed in the random generator may be the difference between finding a solution or not. That's why it's hard to know whether an improved algorithm was an improvement or if it just was the random walk that took a different path and happened to strike gold this time.

I've introduced a data structure called an LFV, a letter frequency vector. It's simply an array with one entry for each letter, each entry holding the frequency of that letter (how often it occurs). In this case there are 26 entries, one for each English letter with frequencies in the 0 to 99 range.

Now a Pangram can be turned into an LFV, that's obvious, any text can. But in this case the opposite is true too. An LFV holding the variable part of a Pangram can be used to generate the whole Pangram text. Say the frequency for the letter 't' is 42, then that would correspond to the text "fourty-two t's" in the Pangram.

Say we have an LFV called the Variate and a function called the PangramFunction which takes an LFV and turns it into a Pangram and then returns an LFV of that Pangram. Then the problem of finding a Self-Enumerating Pangram can be expressed with this equality,

Variate == PangramFunction(Variate)

It's simply a question of finding the Variate that transforms into itself when passing the Pangram function. The Variate has a total of 26^100 combinations so finding the solution Variate is not exactly easy picking to say the least.

To determine how close a Variate is to a solution the squared distance between Variate and PangramFunction(Variate) is used. When this distance is 0 they're equal and that's the ultimate goal.

Some experimentation shows that a good strategy is to make very small frequency changes to a few letters of the Variate in each iteration. If this results in a lower squared distance the modified Variate is accepted. To prevent this process from getting stuck occasionally a Variate with higher squared distance is accepted. But that's standard procedure for Monte Carlo.

Re: General Programming Question - Self Enumerating Pangrams

This is the result of some "recreational programming" this weekend. :)

It's written using VS2012 Express for Windows Desktop. It's started like this,

SEP sep;

sep.simulation();

Unfortunately I missed there should be an "and" before the last letter in a Pangram text. I didn't want to modify the program at that point so the fix is instead to add an "and" to the constant part of the Pangram. Interestingly this has a huge impact on solution time. The "After much tribulation, Sam Kennedy made this pangram which contains: " Pangram took 62 million iterations without the "and " but 892 million with it. This really demonstrates the true probablistic nature of Monte Carlo simulations.

This program has found a solution in under 1 billion iterations to all Pangrams I've tested. But it's still only an approximate algorithm and sooner or later this promising track record will be broken.

Code:

`#include <cctype>`

#include <random>

inline int toOrd(char ch) { // order number of char

return std::tolower(ch) - 'a';

}

inline char toChar(int order) { // char corresponding to order number

return static_cast<char>('a' + order);

}

double rndReal() { // a random double between 0 and 1

static std::default_random_engine engine;

return std::uniform_real_distribution<double>(0.0, 1.0)(engine);

}

int rndInt(int range) { // a random int between 0 and range-1

static std::default_random_engine engine;

return std::uniform_int_distribution<int>(0, range-1)(engine);

}

template <int N>

class LFV { // Letter Frequency Vector

public:

LFV() {

for (int i=0; i<N; ++i) v[i] = 0;

}

int size() const {return N;}

const int& operator[](const int i) const {return v[i];}

int& operator[](const int i) {return v[i];}

LFV<N>& operator+=(const LFV<N>& r) {

for (int i=0; i<N; ++i) v[i] += r.v[i];

return *this;

}

friend LFV<N> operator+(LFV<N> l, const LFV<N>& r) {

l += r;

return l;

}

LFV<N>& operator-=(const LFV<N>& r) {

for (int i=0; i<N; ++i) v[i] -= r.v[i];

return *this;

}

friend LFV<N> operator-(LFV<N> l, const LFV<N>& r) {

l -= r;

return l;

}

friend bool operator==(const LFV<N>& l, const LFV<N>& r) {

for (int i=0; i<N; ++i) {

if (l.v[i] != r.v[i]) return false;

}

return true;

}

private:

int v[N];

};

template <int N>

std::string toString(const LFV<N>& lfv) { // print version of LFV

std::string str = "<";

for (int i=0; i<N; ++i) {

if (i>0) str += ",";

str += std::to_string(lfv[i]);

}

return str + ">";

}

template <int N>

LFV<N> toLFV(const std::string& str) { // generate LFV from string

LFV<N> lfv;

for (char ch : str) {

if (std::isalpha(ch)) lfv[toOrd(ch)]++; // only letters count

}

return lfv;

}

template <int N>

inline int sqrDiff(const LFV<N>& lfv1, const LFV<N>& lfv2) { // squared distance between LFV's

int sum = 0;

for (int i=0; i<N; ++i) {

int diff = lfv1[i] - lfv2[i];

sum += diff*diff;

}

return sum;

}

class SEP { // Self Enumerating Pangram

public:

SEP() {

// constant part of pangram (select one or make your own)

CONSTPART = "After much tribulation, Sam Kennedy made this pangram which contains: ";

// CONSTPART = "After much tribulation, Sam Kennedy made this pangram which contains: (and) ";

// CONSTPART = "This first pangram has ";

// CONSTPART = "This first pangram has (and) ";

// CONSTPART = "Now let's see if Nuzzle can find a pangram: ";

// CONSTPART = "Now let's see if Nuzzle can find a pangram: (and) ";

// generate text versions of letter frequencies, and their LFV counterparts

std::array<std::string,DECBASE> singles = {"","one","two","three","four","five","six","seven","eight","nine"};

std::array<std::string,DECBASE> twenties = {"ten","eleven","twelve","thirteen","fourteen","fifteen","sixteen","seventeen","eighten","nineteen"};

std::array<std::string,DECBASE> decades = {"","","twenty","thirty","fourty","fifty","sixty","seventy","eighty","ninty"};

for (int i=0; i<MAXNUM; ++i) {

std::string str;

if (i<DECBASE) str = singles[i%DECBASE];

else if (i<2*DECBASE) str = twenties[i%DECBASE];

else {

str = decades[i/DECBASE];

if (i%DECBASE > 0) str += "-" + singles[i%DECBASE];

}

// std::cout << str << std::endl;

hundrednum[i] = str;

hundrednumLFV[i] = toLFV<LETTERS>(str);

}

}

void simulation() { // search for self-enumerating pangram using Monte Carlo

int const MAXTRIES = 2000000000; // 2 billion (close to max for an int)

Lfv const constpart = toLFV<LETTERS>(CONSTPART);

Lfv variate = constpart;

Lfv pangram = pangramLFV(variate, constpart);

int diff = sqrDiff(variate, pangram);

int minDiff = diff;

int tries=0;

while (tries++ < MAXTRIES && diff > 0) { // until search exhausted or pangram found

Lfv newVariate = variate; // trial variate

for (int letter=0; letter<LETTERS; ++letter) { // all letters in variate

switch (rndInt(16)) { // magic number controlling change probabilities

case 0: { // letter frequency is increases by one

int frequency = newVariate[letter] + 1;

newVariate[letter] = (frequency < MAXNUM) ? frequency : MAXNUM-1;

break;

}

case 1: { // decreased by one

int frequency = newVariate[letter] - 1;

newVariate[letter] = (frequency >= 0) ? frequency : 0;

break;

}

default:; // no change

}

}

pangram = pangramLFV(newVariate, constpart); // pangram associated with trial variate

int newDiff = sqrDiff(newVariate, pangram); // difference between trial variate and pangram

if (newDiff < diff || rndReal() < 0.00001) { // accept lower difference, and higher at random

diff = newDiff;

variate = newVariate;

if (diff < minDiff) { // new minimum difference

minDiff = diff;

std::cout << "* diff=" << minDiff << ", tries=" << tries << std::endl;

std::cout << "variate:" << toString(variate) << std::endl;

std::cout << "pangram:" << toString(pangram) << std::endl;

}

}

}

if (diff == 0) {

std::cout << "* a hit!" << std::endl;

std::cout << pangramTxt(variate, CONSTPART) << std::endl;

}

std::cout << "* ready" << std::endl;

}

private:

static const int LETTERS = 26; // number of letters in English

typedef LFV<LETTERS> Lfv; // LFV type specialized for letter range

static int const DECBASE = 10; // decimal number base

static int const MAXNUM = 100; // maximum letter frequency

std::string CONSTPART; //constant part of pangram

std::array<std::string,MAXNUM> hundrednum; // letter frequences, text version

std::array<Lfv,MAXNUM> hundrednumLFV; // letter frequencies, LFV version

// text version of pangram (consisting of a variable and a constant part)

std::string pangramTxt(const Lfv& varpart, const std::string& constpart) {

std::string str = constpart;

bool b = false;

for (int letter=0; letter<LETTERS; ++letter) {

int frequency = varpart[letter];

if (frequency > 0) {

if (b) str += ", ";

str += hundrednum[frequency] + " " + toChar(letter);

if (frequency>1) str += "'s"; // plural

b = true;

}

}

return str + ".";

}

// LFV of specific letter and its frequency

Lfv generateLFV(int letter, int frequency) {

Lfv lfv;

if (frequency > 0) {

lfv = hundrednumLFV[frequency];

lfv[letter]++;

if (frequency > 1) lfv[toOrd('s')]++;

}

return lfv;

}

// LFV version of pangram (consisting of a varable and a constant part)

Lfv pangramLFV(const Lfv& varpart, const Lfv& constpart) {

Lfv lfv = constpart;

for (int letter=0; letter<LETTERS; ++letter) {

int frequency = varpart[letter];

if (frequency>0) lfv += generateLFV(letter,frequency);

}

return lfv;

}

};

Re: General Programming Question - Self Enumerating Pangrams

Quote:

Originally Posted by

**nuzzle**
If it's NP-complete which I suspect

It is indeed. In fact it's quite easily reducible to SAT (Satisfiablility) the first of Karp's 21 original NP complete problems.

http://en.wikipedia.org/wiki/Karp's_...plete_problems

The equality in my previous post #6 can be implemented using a logic network consisting of Boolean high-level elements such a ROM memories, adders and comparators which all in their turn can be implemented with simple NAND and NOR gates. This means the logic network constitutes a big Boolean expression. Presented with a Variate input (the 26 letter frequencies in binary form) this network will churn out a single Boolean output telling whether the equality is true or false.

Finding which inputs to a Boolean expression will yield true is a Satisfiability problem. It's known to be NP complete so the Self-Enumerating Pangram problem should be NP complete too. This means it's a hard problem but also that efficient solution strategies can be sought among general SAT-solvers. But that's a project for another weekend. :)

By the way if you're interested in the foundation of computing I can recommend The Golden Ticket by Lance Fortnow. It's a popular science book but it's written by an insider (and not some journalist who's trying to explain what he doesn't understand). It's by far the most accessible still deep book in this category I've ever seen.

Re: General Programming Question - Self Enumerating Pangrams

Quote:

Originally Posted by

**nuzzle**
It's known to be NP complete so the Self-Enumerating Pangram problem is NP complete too.

given an NP-complete problem A and a problem B, in order to prove that B is at least NP-complete you should solve A in terms of B; if I read your post correctly, you did the opposit, so you didn't prove that the pangram problem is NP-complete.

Re: General Programming Question - Self Enumerating Pangrams

Quote:

Originally Posted by

**superbonzo**
given an NP-complete problem A and a problem B, in order to prove that B is at least NP-complete you should solve A in terms of B; if I read your post correctly, you did the opposit, so you didn't prove that the pangram problem is NP-complete.

It's not a formal proof. I just presented the evidence that finally convinced me of NP completness. But a proof shouldn't be that hard. I promise to clearly indicate if I attempt one.

But more importantly, the fact that the Self-Enumerating Pangram problem can be easily reformulated as a SAT problem (specifically CIRCUIT-SAT) means it can be attacked with SAT-solvers.

And if you feel you could easily prove something in one way or another please feel free to do so. Staying Mr. Besserwisser can't be that satisfying.