|
-
August 5th, 2011, 09:59 AM
#1
I want to parse a site , and get some text
Hello, i wrote a program and i can get the contents of a webpage. Now i want to get some values from it.
For example my page is something like this
My name : Bob
My surname : Bob2
My name : Bob3
My surname : Bob4
i want to get Bob,Bob2,Bob3,Bob4.
i put page containts to a richtextbox
so far i managed to strip as much things as i can (but i get a lot of white lines)
Code:
Regex reg = new Regex(@"\s*");
result = reg.Replace(result, "");
result = Regex.Replace(result, @"<.*?>", "\n");
result = Regex.Replace(result, @"[A-Z][1-9]", "\n");
result = Regex.Replace(result, @"[^\w\.@-]", "\n");
string[] words = result.Split(' ');
foreach (string word in words)
{
status_richTextBox.AppendText(word);
}
i get something like this
Code:
(empty line)
(empty line)
(empty line)
(empty line)
My name
Bob
My surname
Bob2
My name
Bob3
My surname
Bob3
How can i get only the values i want ?
-
August 6th, 2011, 01:26 PM
#2
Re: I want to parse a site , and get some text
Some basic string manipilation comes to mind. Use String.Split for instance, or have a look at this excellent thread about the same topic ( well, string manipulation ) :
http://www.codeguru.com/forum/showthread.php?t=515111
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|