|
-
March 5th, 2005, 05:05 PM
#1
using regular expressions
hi there
i'm trying to extract certain information from an HTML source, i've used the : WebClient from the .NET library to Screen Scrspe the web page the second part is to use RegEx to EXTRACT usefull information to me from the HTML source available and because i'm not very familiar with regular expressions i need help to do this for example:
<td class="subhead" colspan="2"><font size="-1">I NEED TO EXTRACT THIS</font></td>
and then to place whatever is extracted within XML tag (i.e. converting)
thanks and regards
jay
-
March 7th, 2005, 11:26 PM
#2
Re: using regular expressions
Code:
string file = string with entire file;
string pattern = "\"<td class=\\\"subhead\\\" colspan=\\\"2\\\"><font size=\\\"-1\\\">\")(.*?)(\"</font></td>\")"
foreach(Match m in Regex.Matches(file, pattern))
{
Group g = m.Groups[i];
CaptureCollection cc = g.Captures;
Capture c = cc[1];
string extractedText = cc.Value;
Console.WriteLine(extractedText);
}
In .*?, the ? is for non-geedy match. () divides the matched text into groups, the strign you need will be in the second group.
I wonder if there's any limit on the maximum length of the string Regex can accept, in your case, it can be well over 20k...
I'd like to know one thing from you/others, is it legal to extract info from webpages like this?! I left one of my apps midway for the fear of getting sued I was extracting some sports statistics related stuff and displaying it in my app... Don't ask which sport/what data!
-
March 8th, 2005, 10:05 AM
#3
Re: using regular expressions
thanks Big ....
regarding the legality of this i have the same problem 'cause i'm goin to extract similar (news) info from any website ...maybe if any one know about this , and how to make it legal and do we need to get permission b4 hand ....
regards
-
March 9th, 2005, 06:55 AM
#4
Re: using regular expressions
sorry big but i think the code above dosnt really work , any other ideas?!
regards
-
March 9th, 2005, 08:04 AM
#5
Re: using regular expressions
-
March 9th, 2005, 08:32 AM
#6
Re: using regular expressions
thanx Big...
and regarding how to make it legal to import stuff from a website , i've sent a message to bbc team to ask them for that and they said:
"
You have permission as long *as you reference the bbc.co.uk as the source
of data*.
"
so you might be able to get some sportie stuff from them ....
regards
-
March 9th, 2005, 11:10 AM
#7
Re: using regular expressions
Hey, I finally managed to get it working!
Code:
string file = "<td class=\"subhead\" colspan=\"2\"><font size=\"-1\">I NEED TO EXTRACT THIS</font></td>";
string pattern = "<td class=\"subhead\" colspan=\"2\"><font size=\"-1\">(.*?)</font></td>";
foreach(Match m in Regex.Matches(file, pattern))
{
Group g = m.Groups[1];
Console.WriteLine(g.ToString());
}
I guess I was making it too complicated by trying to divide the text on left and right into groups...
Regarding the legal issues, that's good to hear! I'll look into it, thanks!
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|