November 3rd, 2012, 09:01 AM
Advice about HTML parser
I hope someone can guide me on this.
I am looking for a good HTML parser that allows me to extract relevant data and I came across JSoup.
It's a good parser and I can use methods to extract data by id,class,tag etc.
The problem is that when the HTML file is parsed and the whole content of a HTML file is extracted, JSoup formats the source by re-ordering tags and other stuff.
Because of this, some values that can be extracted by specifying their respective classname or tag can't be extracted anymore because of how the source has been modified.
My question is, is it there anyway to parse the data using JSoup without re-ordering the content?
another Q: does anyone know an alternative HTML parser that can be used to extract data properly?
Click Here to Expand Forum to Full Width