|
-
July 6th, 2011, 09:28 PM
#1
Parsing HTML For Key Words
cheers again guys for sorting my last thread out and sorry for posting so quick but by fixing it it showed im doing things wrong haha
im trying to parse a webpage of unknown structure to extract key terms using topia.termextract. Howeverim having trouble coping with the complexity of web pages and getting to the main textual content where ever it lies on the page.
Are there any ways of doing it effectivly. I tried reading the whole webpage line by line and scanning for terms, but html tags and spaces and what not just totally destroy that approach. Im stuck basiically,
any ideas to make a start guys?....or at least another start as my last attempt wasnt that good
cheers
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|