-
December 19th, 1999, 05:53 AM
#1
Search Engine
Please send me the source code for a search engine algorithm which searches for astring or a set of strings on the net if on-line or the network machines while off-line.
-
June 27th, 2001, 11:32 AM
#2
Re: Search Engine
What is this for?
Do you want to search the whole web, or a particular site?
algorithm:
foreach file to be indexed:
strip html out
split on spaces
foreach word
add index of word to filename
save index to disk
Then all your search-engine does is read the index off the disk, and if someone asks for word 'hello' it looks that
up in the index, gets a list of filenames back and shows them.
If they enter 'hello world' then it does a search for hello and world, and returns the intersection of those two sets, and then the intersection - the union.
If you want to start doing things like "hello world" (ie one word) then you'll need to also index sets of words.
That's pretty simple, but good enough for a website's internal seach I reckon. For more power you can investigate fuzzy text matching so that Hullo will match to Hello as well.
If you want java that searches the web like google.com or something, just write a front end to google. No way you can match them on your own.
Bayard
bayard@generationjava.com
Brainbench MVP for Java
http://www.brainbench.com
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|