i programmed a very simple indexing algorithm, wich results in a alphabetically sorted list of words with their positions (wich file, wich position in the file).
This list is splitted in some big files, but the details may not be very interesting :-)
Question: beside finding very fast the position of words or the position of special combinations, what can you do with such index files?
I was wondering, if you can do some correlation between the word positions and finding some strucures. But cross-correlating about 50.000 words results in 50.000 x 50.000 correlations wich have to be derived, and I do not have a fast enough algorithm.
Can you do some heuristics to find just some "interesting" words wich should be correlated?
How about restricting to calculating the correlation between words that occur within N (say N=10) words of each other?
Best Regards,
BioPhysEngr http://blog.biophysengr.net
--
All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.
Bookmarks