I want to make a simple java program to extract information out of an HTML page i maintain. The way I see it I need to parse the HTML document throwing away the tags and saving the data. I have read in the entire HTML page as a String and all I need to do is parse it, I woulb be very gratefull if you could point me in the right direction.

I have tried StringTokenizer and StreamTokenizer,but I find the parsing is still fairly complicated. I have also considered Java Jack gramatical tool, but it seemed too complicated for what I wanted.
Thank you.