-
September 30th, 2009, 02:58 AM
#1
[RESOLVED] C# regex to find html tag
Hi all,
I'm trying to analyze the data off some html files.
I want to retrieve the <meta name="keywords" content="keyword1, another_keyword" /> tag from the html. Using Regex I'm able to this via
Code:
MatchCollection keywords = Regex.Matches(html, "<meta name=\"keywords\" content=\".*\" />");
This works. Probably not the best regular expression written in history but it works. But now I noticed that on some pages the attributes in the tag have a different order, thus it changes to <meta content="keyword1, another_keyword" name="keywords" />. Now my Regex doesn't work anymore.
I could solve it as following
Code:
MatchCollection keywords = Regex.Matches(html, "<meta name=\"keywords\" content=\".*\" />");
if (keywords.Count == 0)
keywords = Regex.Matches(html, "<meta content=\".*\" name=\"keywords\" />");
But my guess is that there should be a way to this in one statement.
-
September 30th, 2009, 01:53 PM
#2
Re: C# regex to find html tag
Looking at the HTML Meta Tag Parser I think you should use a two step solution where you first select your meta tags and then select the content with something like:
Code:
string regex = @"(?<name>content)s*=\s*(\""(?<value>[^\""]*)\""|'(?<value>[^']*)'|(?<value>[^\""'<> ]+)\s*)+";
-
October 1st, 2009, 02:26 AM
#3
Re: C# regex to find html tag
The HTML Meta Tag Parser is really good.
Thanks a lot!
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|