I want to parse a site , and get some text

**invader7** · August 5th, 2011, 09:59 AM

Hello, i wrote a program and i can get the contents of a webpage. Now i want to get some values from it.

For example my page is something like this

My name : Bob
My surname : Bob2

My name : Bob3
My surname : Bob4

i want to get Bob,Bob2,Bob3,Bob4.

i put page containts to a richtextbox
so far i managed to strip as much things as i can (but i get a lot of white lines)

Code:

                Regex reg = new Regex(@"\s*");
                result = reg.Replace(result, "");

                result = Regex.Replace(result, @"<.*?>", "\n");
                result = Regex.Replace(result, @"[A-Z][1-9]", "\n");
                result = Regex.Replace(result, @"[^\w\.@-]", "\n");


                string[] words = result.Split(' ');
                foreach (string word in words)
                {
                   status_richTextBox.AppendText(word);
                }

i get something like this

Code:

(empty line)
(empty line)
(empty line)
(empty line)
My name
Bob
My surname
Bob2
My name
Bob3
My surname
Bob3

How can i get only the values i want ?

**HanneSThEGreaT** · August 6th, 2011, 01:26 PM

Some basic string manipilation comes to mind. Use String.Split for instance, or have a look at this excellent thread about the same topic ( well, string manipulation ) :

http://www.codeguru.com/forum/showthread.php?t=515111

Thread: I want to parse a site , and get some text

Thread Tools

Display

I want to parse a site , and get some text

Re: I want to parse a site , and get some text

Posting Permissions