CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 2 of 2
  1. #1
    Join Date
    Apr 2009
    Posts
    2

    Is there an algorithm to compress based on repeating substrings?

    I'm looking for algorithm that will compress a given string by creating a lookup table of repeating substrings of the string, so that the size of lookup table together with the size of the encoded string is minimal. So for instance, the input string

    ababcabcabc

    could be compressed as

    lookup table
    s=ab
    t=c
    encoded string
    sststst
    12 total characters used

    or alternatively,

    lookup:
    m=ab
    j=abc
    encoded:
    mjjj
    11 total characters used.

    So the second example results in better compression, because it chooses better repeating substrings from the original string from the lookup table.

    Anyway, assuming we are handed a way to get all the repeating substrings from a string, can any one think of a way to always get the smallest (lookup table+encoded string)? The answer may be simple or complex, but I wanted to throw it out to see if anybody knows it before I spend any more time trying to figure it out.

  2. #2
    Join Date
    Mar 2009
    Posts
    19

    Re: Is there an algorithm to compress based on repeating substrings?

    What you're looking for is a dictionary-based encoding scheme. The word "dictionary" alone should provide you ample avenues if you search, but "LZW" might be a good starting place.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured