|
-
June 11th, 2011, 02:26 AM
#1
GCS-Variation
Hello Experts,
I have a problem, maybe connected to the greatest common substring Problem.
There is a file of about 10-20 Mbytes. Inside this file, you can think of it as a textfile, there are few double parts of about 10-50 kbytes. It looks like
"sometextXXXanothertextXXXlasttextpart", where the XXX-s are identical strings.
How can I find as fast and as reliable as possible the double parts?
My first shot is dividing the whole file in single parts like lines or chapters, deriving hash-values und doing the modified gcs with the shorter file. But this is far away from reliable.
Any Ideas? Thank you!
GMarco
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|