cheers again guys for sorting my last thread out and sorry for posting so quick but by fixing it it showed im doing things wrong haha

im trying to parse a webpage of unknown structure to extract key terms using topia.termextract. Howeverim having trouble coping with the complexity of web pages and getting to the main textual content where ever it lies on the page.

Are there any ways of doing it effectivly. I tried reading the whole webpage line by line and scanning for terms, but html tags and spaces and what not just totally destroy that approach. Im stuck basiically,

any ideas to make a start guys?....or at least another start as my last attempt wasnt that good

cheers