reading a text file

Printable View

December 13th, 2002, 10:32 AM
hetshah

reading a text file

I am trying to read in the contents of an HTML file converted to text. As the HMTL text file contains alot of text i dont need and there are no real defined structure of lines, it is very difficult.
Here is an example piece of text i am trying to read in, with the bold words and numbers being of use.

e.g.

<TR><TD><IMG SRC="/Images/nameptr.gif" ALT="BR3" ALIGN="LEFT">Battle River #3
</TD><TD>148 </TD><TD>150 </TD><TD>0 </TD></TR>
<TR><TD><IMG SRC="/Images/nameptr.gif" ALT="BR4" ALIGN="LEFT">Battle River #4 </TD><TD>148 </TD><TD>148 </TD><TD>0 </TD></TR>
</TABLE

Any ideas on how i can just get the Name and 3 associated numbers with it. (there are a total of 100+ entries)

THanks alot...any help is much much appreciated
December 13th, 2002, 10:39 AM
TheCPUWizard

Process the file character by character.

If the character is "<" and does does not follow a "\" then increment a counter
If the character is ">" and does does not follow a "\" then decrement the counter

For all other conditions, save the character in a buffer IF the counter is zero.

It will be the raw text.
December 13th, 2002, 01:04 PM
defunct

You should consider looking into using Boost.Regex (www.boost.org) or even Xerces-C (xml.apache.org). Both libraries should help you parse the HTML relatively easily.

Regards,
-d

All times are GMT -5. The time now is 01:30 AM.