CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 9 of 9

Thread: html parsing

  1. #1
    Join Date
    Aug 2009
    Posts
    3

    html parsing

    Hey guys!
    i'm kind of new in the field. I need to get the text from inside a html tag (more specificaly <div class="only this div"). i need to exclude all other divs. I found this nice example http://www.developer.com/net/csharp/...0918_2230091_1 but i can only get the name and value of the tag. Is it possible to also get the entire text?
    Thanks in advance

  2. #2
    Join Date
    Jul 2006
    Posts
    297

    Re: html parsing

    Welcome to the forum,

    If you can give me a sample of what your input looks like and what you want for the output then I can probably help you. Please try to be as specific as possible.

  3. #3
    Join Date
    Aug 2009
    Posts
    3

    Re: html parsing

    example:
    <div id="farright">
    <div id="ads-right">
    <div id="ads-right-twotop">
    content from ads right twotop
    enddiv ads right twotop
    content ads right
    end div ads right
    end div farright


    i would like to get only the content of the ads-right div. The html page is obviously much larger but i think it' a good example..

  4. #4
    Join Date
    Jul 2006
    Posts
    297

    Re: html parsing

    So what your saying is if you had.

    Code:
    <div id="ads-right">
        <div>Blah blah blah</div>
    </div>
    You want

    Code:
    <div> Blah blah blah</div>
    or

    Code:
    div id="ads-right"

  5. #5
    Join Date
    Aug 2009
    Posts
    3

    Re: html parsing

    i need <div> Blah blah blah</div>

  6. #6
    Join Date
    Jul 2009
    Posts
    43

    Re: html parsing

    Code:
    const string htmlTag = @"<div>(.*?)</div>";
    Make sure to include a reference to System.Text.RegularExpressions
    New to C# | Using VS 2008 with 3.5.

  7. #7
    Join Date
    Jul 2006
    Posts
    297

    Re: html parsing

    Well... that sucks hah. You can't do that with the parser you linked me on the earlier post. There's no simple way to parse HTML without writing your own parser. There is a class in .NET called WebBrowser which works really well for this type of thing because you can parse through the HTML easily. Ironically though, if you're using this class in a website project its more difficult to get the WebBrowser class to work because it must be run in a STA thread and it has a couple events which must be handled, all very possible, but i can't just write 2 lines of code and be done with it.

    All very possible but I don't have the time right now to write the complete functioning code. You may be able to find some examples on google for how to use the WebBrowser class. If you have any specific questions on how to get it to work i'll do my best to help ya out.

    Its very likely that there exists a HTML parser which has already been written that will work for you, but I do not know of one. I'm sure one of the other posters here may have an idea.

  8. #8
    Join Date
    May 2006
    Posts
    306

    Re: html parsing

    You'll need to use mshtml.dll and then use Microsoft.

    You'll open up new classes that are real *****es to use, but once you know how to work them everything slides right in.

    Just use a WebClient to download the html string, or manipulate whatever. Insert it into HtmlDocument3/4/5/6 class htmlContent or whatever.

    That's my solution. It does not involve any use of WebBrowser.

  9. #9
    Join Date
    Jul 2006
    Posts
    297

    Re: html parsing

    Quote Originally Posted by code? View Post
    You'll need to use mshtml.dll and then use Microsoft.

    You'll open up new classes that are real *****es to use, but once you know how to work them everything slides right in.

    Just use a WebClient to download the html string, or manipulate whatever. Insert it into HtmlDocument3/4/5/6 class htmlContent or whatever.

    That's my solution. It does not involve any use of WebBrowser.
    Yes the two both use mshtml.dll and are a pain to use. I've used both before, but like you said. Once you get it to work its very useful. Had to do it once to create a screenshot of any website... that was a fun little project hah.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured