CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 3 of 3
  1. #1
    Join Date
    Aug 1999
    Location
    CANADA
    Posts
    69

    reading a text file

    I am trying to read in the contents of an HTML file converted to text. As the HMTL text file contains alot of text i dont need and there are no real defined structure of lines, it is very difficult.
    Here is an example piece of text i am trying to read in, with the bold words and numbers being of use.

    e.g.

    <TR><TD><IMG SRC="/Images/nameptr.gif" ALT="BR3" ALIGN="LEFT">Battle River #3
    </TD><TD>148 </TD><TD>150 </TD><TD>0 </TD></TR>
    <TR><TD><IMG SRC="/Images/nameptr.gif" ALT="BR4" ALIGN="LEFT">Battle River #4 </TD><TD>148 </TD><TD>148 </TD><TD>0 </TD></TR>
    </TABLE

    Any ideas on how i can just get the Name and 3 associated numbers with it. (there are a total of 100+ entries)


    THanks alot...any help is much much appreciated
    HET

  2. #2
    Join Date
    Mar 2002
    Location
    St. Petersburg, Florida, USA
    Posts
    12,125
    Process the file character by character.

    If the character is "<" and does does not follow a "\" then increment a counter
    If the character is ">" and does does not follow a "\" then decrement the counter

    For all other conditions, save the character in a buffer IF the counter is zero.

    It will be the raw text.
    TheCPUWizard is a registered trademark, all rights reserved. (If this post was helpful, please RATE it!)
    2008, 2009,2010
    In theory, there is no difference between theory and practice; in practice there is.

    * Join the fight, refuse to respond to posts that contain code outside of [code] ... [/code] tags. See here for instructions
    * How NOT to post a question here
    * Of course you read this carefully before you posted
    * Need homework help? Read this first

  3. #3
    Join Date
    Jan 2002
    Posts
    19
    You should consider looking into using Boost.Regex (www.boost.org) or even Xerces-C (xml.apache.org). Both libraries should help you parse the HTML relatively easily.

    Regards,
    -d

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured