[RESOLVED] Regular Expression in C#
I am making a program in C# that reads an html file that another one of my programs has outputted.
Code:
<input type="hidden" name="value5" value="1247555244" />
what i need to find is what value equals, eg i need something to return "1247555244" and only that.
Please keep in mind that there is much other html than this in the file. i am new to regex and i need someone to show me how to make a regular expression for use in c# to find a value that contains all digits and return only those digits.
Thanks.
Re: Regular Expression in C#
Well what i suggest doing is downloading a program called RegexBuddy. Anyone just starting regular expressions should use it. It allows you to build it step by step and shows you what all the different characters do. It also has a bunch of presets, like email, phone numbers, etc...
As for your particular problem this might work. I would need a larger sample of your html to make sure that the regex finds all the cases and is strict enough to find only what you're looking for. The expression I used should find the value attribute of any HTML tag that has a value consisting of only digits.
Code:
Match m = Regex.Match(inputString, @"<([\w]+) (.+)?value=""(?<value>[\d]+)""([^/>]+)?/?>");
while (m.Groups["value"].Success)
{
// Place your code here
Int32 value = Convert.ToInt32(m.Groups["value"].Value);
m = m.NextMatch();
}
Re: Regular Expression in C#
Quote:
Originally Posted by
monalin
Well what i suggest doing is downloading a program called RegexBuddy. Anyone just starting regular expressions should use it. It allows you to build it step by step and shows you what all the different characters do. It also has a bunch of presets, like email, phone numbers, etc...
As for your particular problem this might work. I would need a larger sample of your html to make sure that the regex finds all the cases and is strict enough to find only what you're looking for. The expression I used should find the value attribute of any HTML tag that has a value consisting of only digits.
Code:
Match m = Regex.Match(inputString, @"<([\w]+) (.+)?value=""(?<value>[\d]+)""([^/>]+)?/?>");
while (m.Groups["value"].Success)
{
// Place your code here
Int32 value = Convert.ToInt32(m.Groups["value"].Value);
m = m.NextMatch();
}
Thats almost what i need, but it returns
Code:
"<input type=\"hidden\" name=\"value25\" value=\"1247605870\" />"
and i need it to only return the number "1247605870"
and thanks for the tip about the newbie regular expression program.
Re: Regular Expression in C#
If you need to parse different tags and whatnot you should build your own parser. This will allow you to do things like read a tag and then get its attributes (i.e., "value") as properties of your class. You will need more regular expressions than simply the one needed to parse out "value", so I would approach it from that angle. Your parsing will be done by your HTMLDocument class (I made up the name obviously). You cannot just parse HTML as XML as it does not follow the XML spec.
Re: Regular Expression in C#
Quote:
Originally Posted by
BigEd781
If you need to parse different tags and whatnot you should build your own parser. This will allow you to do things like read a tag and then get its attributes (i.e., "value") as properties of your class. You will need more regular expressions than simply the one needed to parse out "value", so I would approach it from that angle. Your parsing will be done by your HTMLDocument class (I made up the name obviously). You cannot just parse HTML as XML as it does not follow the XML spec.
I don't need to find just "value" i need to find a value that has only numbers in it.
and i know for a fact that there will only be one value= with 100% numbers after it.
Re: Regular Expression in C#
Quote:
Originally Posted by
Pale
Thats almost what i need, but it returns
Code:
"<input type=\"hidden\" name=\"value25\" value=\"1247605870\" />"
and i need it to only return the number "1247605870"
and thanks for the tip about the newbie regular expression program.
I just tried it again. When i run it, the Int32 value gets assigned 1247605870. The only group that will return the whole string is.
Also, its not a newbie regular expression program. I still use it every day, i don't use the regex generator i type it in myself but it allows me to easily test each regex i make.
Re: Regular Expression in C#
Quote:
Originally Posted by
Pale
I don't need to find just "value" i need to find a value that has only numbers in it.
and i know for a fact that there will only be one value= with 100% numbers after it.
I fail to see how that is relevant. When I say "value", I mean an attribute named "value" and its corresponding...value. You said yourself that you need to handle other types of attributes and tags, so why only program for each individual case when you can simply create a routine that returns attribute:
Code:
HtmlDocument myDoc = new HtmlDocument( myHtmlSource);
int value = Int32.Parse( myDoc.Nodes( "input" ).GetAttribute( "value" ) );
Nicer, eh? Now the parsing is confined to the HtmlDocument class and is reusable anywhere in code and also can do more than one thing.
Re: Regular Expression in C#