I'm looking for algorithm that will compress a given string by creating a lookup table of repeating substrings of the string, so that the size of lookup table together with the size of the encoded string is minimal. So for instance, the input string

ababcabcabc

could be compressed as

lookup table
s=ab
t=c
encoded string
sststst
12 total characters used

or alternatively,

lookup:
m=ab
j=abc
encoded:
mjjj
11 total characters used.

So the second example results in better compression, because it chooses better repeating substrings from the original string from the lookup table.

Anyway, assuming we are handed a way to get all the repeating substrings from a string, can any one think of a way to always get the smallest (lookup table+encoded string)? The answer may be simple or complex, but I wanted to throw it out to see if anybody knows it before I spend any more time trying to figure it out.