CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 2 of 2
  1. #1
    Join Date
    Apr 2012
    Posts
    1

    Possible and if so how difficult? Parsing source code from a link & regexp matching

    Simply

    I have one webpage (call it Webpage1). On that page, there are a significant number of links to other pages that are one link deep (call them Webpage').

    In the source code in Webpage' there is information (or it may be absent) that can be easily matched using regexp.

    Esentially, that information are names. Note this is no attempt at personal privacy breaching.

    I need to write a program (perhaps in Java) to take a webpage, match a link on that webpage, open that page and parse the source code and in that source code, match the "names". Then take those matches and consolidate them all in a text file.

    Another way of describing this.

    Webpage1 --> Webpage' --> open source code --> regexp match "names" --> print names to text file and save.

    Thing is there is a large number of "Webpage' " links.

    There are programs out there that do something like this. A program called "downthemall" and an extension for it called "anticontainer" will match a take all the links on a webpage, match links that are appropriate (using regexp), open those links, parse the source code and using regexp match parts of the source to build links to things that are "hidden" (like images).

    Suggestions?

  2. #2
    Join Date
    May 2006
    Location
    UK
    Posts
    4,473

    Re: Possible and if so how difficult? Parsing source code from a link & regexp matchi

    If there are already programs out there that do what you want then my advice is to use one of those.
    Posting code? Use code tags like this: [code]...Your code here...[/code]
    Click here for examples of Java Code

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured