CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 3 of 3
  1. #1
    Join Date
    Sep 2008
    Location
    Netherlands
    Posts
    865

    [RESOLVED] C# regex to find html tag

    Hi all,

    I'm trying to analyze the data off some html files.

    I want to retrieve the <meta name="keywords" content="keyword1, another_keyword" /> tag from the html. Using Regex I'm able to this via
    Code:
    MatchCollection keywords = Regex.Matches(html, "<meta name=\"keywords\" content=\".*\" />");
    This works. Probably not the best regular expression written in history but it works. But now I noticed that on some pages the attributes in the tag have a different order, thus it changes to <meta content="keyword1, another_keyword" name="keywords" />. Now my Regex doesn't work anymore.
    I could solve it as following
    Code:
    MatchCollection keywords = Regex.Matches(html, "<meta name=\"keywords\" content=\".*\" />");
    if (keywords.Count == 0)
      keywords = Regex.Matches(html, "<meta content=\".*\" name=\"keywords\" />");
    But my guess is that there should be a way to this in one statement.

  2. #2
    Join Date
    Mar 2007
    Posts
    90

    Re: C# regex to find html tag

    Looking at the HTML Meta Tag Parser I think you should use a two step solution where you first select your meta tags and then select the content with something like:
    Code:
    string regex = @"(?<name>content)s*=\s*(\""(?<value>[^\""]*)\""|'(?<value>[^']*)'|(?<value>[^\""'<> ]+)\s*)+";

  3. #3
    Join Date
    Sep 2008
    Location
    Netherlands
    Posts
    865

    Talking Re: C# regex to find html tag

    The HTML Meta Tag Parser is really good.

    Thanks a lot!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured