CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 2 of 2
  1. #1
    Join Date
    Jun 2009
    Posts
    144

    I want to parse a site , and get some text

    Hello, i wrote a program and i can get the contents of a webpage. Now i want to get some values from it.

    For example my page is something like this

    My name : Bob
    My surname : Bob2


    My name : Bob3
    My surname : Bob4


    i want to get Bob,Bob2,Bob3,Bob4.

    i put page containts to a richtextbox
    so far i managed to strip as much things as i can (but i get a lot of white lines)

    Code:
                    Regex reg = new Regex(@"\s*");
                    result = reg.Replace(result, "");
    
                    result = Regex.Replace(result, @"<.*?>", "\n");
                    result = Regex.Replace(result, @"[A-Z][1-9]", "\n");
                    result = Regex.Replace(result, @"[^\w\.@-]", "\n");
    
    
                    string[] words = result.Split(' ');
                    foreach (string word in words)
                    {
                       status_richTextBox.AppendText(word);
                    }
    i get something like this

    Code:
    (empty line)
    (empty line)
    (empty line)
    (empty line)
    My name
    Bob
    My surname
    Bob2
    My name
    Bob3
    My surname
    Bob3
    How can i get only the values i want ?

  2. #2
    Join Date
    Jul 2001
    Location
    Sunny South Africa
    Posts
    11,284

    Re: I want to parse a site , and get some text

    Some basic string manipilation comes to mind. Use String.Split for instance, or have a look at this excellent thread about the same topic ( well, string manipulation ) :

    http://www.codeguru.com/forum/showthread.php?t=515111

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured