|
-
July 14th, 2009, 02:11 AM
#1
[RESOLVED] Regular Expression in C#
I am making a program in C# that reads an html file that another one of my programs has outputted.
Code:
<input type="hidden" name="value5" value="1247555244" />
what i need to find is what value equals, eg i need something to return "1247555244" and only that.
Please keep in mind that there is much other html than this in the file. i am new to regex and i need someone to show me how to make a regular expression for use in c# to find a value that contains all digits and return only those digits.
Thanks.
-
July 14th, 2009, 10:02 AM
#2
Re: Regular Expression in C#
Well what i suggest doing is downloading a program called RegexBuddy. Anyone just starting regular expressions should use it. It allows you to build it step by step and shows you what all the different characters do. It also has a bunch of presets, like email, phone numbers, etc...
As for your particular problem this might work. I would need a larger sample of your html to make sure that the regex finds all the cases and is strict enough to find only what you're looking for. The expression I used should find the value attribute of any HTML tag that has a value consisting of only digits.
Code:
Match m = Regex.Match(inputString, @"<([\w]+) (.+)?value=""(?<value>[\d]+)""([^/>]+)?/?>");
while (m.Groups["value"].Success)
{
// Place your code here
Int32 value = Convert.ToInt32(m.Groups["value"].Value);
m = m.NextMatch();
}
-
July 14th, 2009, 04:12 PM
#3
Re: Regular Expression in C#
 Originally Posted by monalin
Well what i suggest doing is downloading a program called RegexBuddy. Anyone just starting regular expressions should use it. It allows you to build it step by step and shows you what all the different characters do. It also has a bunch of presets, like email, phone numbers, etc...
As for your particular problem this might work. I would need a larger sample of your html to make sure that the regex finds all the cases and is strict enough to find only what you're looking for. The expression I used should find the value attribute of any HTML tag that has a value consisting of only digits.
Code:
Match m = Regex.Match(inputString, @"<([\w]+) (.+)?value=""(?<value>[\d]+)""([^/>]+)?/?>");
while (m.Groups["value"].Success)
{
// Place your code here
Int32 value = Convert.ToInt32(m.Groups["value"].Value);
m = m.NextMatch();
}
Thats almost what i need, but it returns
Code:
"<input type=\"hidden\" name=\"value25\" value=\"1247605870\" />"
and i need it to only return the number "1247605870"
and thanks for the tip about the newbie regular expression program.
-
July 14th, 2009, 04:15 PM
#4
Re: Regular Expression in C#
If you need to parse different tags and whatnot you should build your own parser. This will allow you to do things like read a tag and then get its attributes (i.e., "value") as properties of your class. You will need more regular expressions than simply the one needed to parse out "value", so I would approach it from that angle. Your parsing will be done by your HTMLDocument class (I made up the name obviously). You cannot just parse HTML as XML as it does not follow the XML spec.
-
July 14th, 2009, 04:23 PM
#5
Re: Regular Expression in C#
 Originally Posted by BigEd781
If you need to parse different tags and whatnot you should build your own parser. This will allow you to do things like read a tag and then get its attributes (i.e., "value") as properties of your class. You will need more regular expressions than simply the one needed to parse out "value", so I would approach it from that angle. Your parsing will be done by your HTMLDocument class (I made up the name obviously). You cannot just parse HTML as XML as it does not follow the XML spec.
I don't need to find just "value" i need to find a value that has only numbers in it.
and i know for a fact that there will only be one value= with 100% numbers after it.
-
July 14th, 2009, 04:32 PM
#6
Re: Regular Expression in C#
 Originally Posted by Pale
Thats almost what i need, but it returns
Code:
"<input type=\"hidden\" name=\"value25\" value=\"1247605870\" />"
and i need it to only return the number "1247605870"
and thanks for the tip about the newbie regular expression program.
I just tried it again. When i run it, the Int32 value gets assigned 1247605870. The only group that will return the whole string is.
Also, its not a newbie regular expression program. I still use it every day, i don't use the regex generator i type it in myself but it allows me to easily test each regex i make.
-
July 14th, 2009, 04:38 PM
#7
Re: Regular Expression in C#
 Originally Posted by Pale
I don't need to find just "value" i need to find a value that has only numbers in it.
and i know for a fact that there will only be one value= with 100% numbers after it.
I fail to see how that is relevant. When I say "value", I mean an attribute named "value" and its corresponding...value. You said yourself that you need to handle other types of attributes and tags, so why only program for each individual case when you can simply create a routine that returns attribute:
Code:
HtmlDocument myDoc = new HtmlDocument( myHtmlSource);
int value = Int32.Parse( myDoc.Nodes( "input" ).GetAttribute( "value" ) );
Nicer, eh? Now the parsing is confined to the HtmlDocument class and is reusable anywhere in code and also can do more than one thing.
-
July 14th, 2009, 04:51 PM
#8
Re: Regular Expression in C#
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|