i programmed a very simple indexing algorithm, wich results in a alphabetically sorted list of words with their positions (wich file, wich position in the file).
This list is splitted in some big files, but the details may not be very interesting :-)
Question: beside finding very fast the position of words or the position of special combinations, what can you do with such index files?
I was wondering, if you can do some correlation between the word positions and finding some strucures. But cross-correlating about 50.000 words results in 50.000 x 50.000 correlations wich have to be derived, and I do not have a fast enough algorithm.
Can you do some heuristics to find just some "interesting" words wich should be correlated?