CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 4 of 4
  1. #1
    Join Date
    Oct 2010
    Posts
    60

    Wikipedia parser

    After reading the latest xkcd: http://xkcd.com/903/ I started writing a program that will find the first link on each article, and see if it does eventually lead to philosophy. However, I know next to no HTML, so I don't know how I would find out what the first link is. I know that it should look like <a href something something> but there's many links before that that are not what a user would consider the first link. Does anyone have any ideas on how to do this?
    (Also, I wasn't sure what forum this should go on, since this isn't really Java-specific, and I already know the syntax I'd use.)

  2. #2
    Join Date
    Jun 1999
    Location
    Eastern Florida
    Posts
    3,877

    Re: Wikipedia parser

    Write a program to extract all the links and print them out. Then look at what is printed out to see which link you want to find and then change your program to find that link.
    Norm

  3. #3
    Join Date
    Oct 2010
    Posts
    60

    Re: Wikipedia parser

    I have, but I can't find a pattern that would work. The problem seems to be that infoboxes, tags, and pictures show up before the real body of the text, and I can't figure out how to tell if a link is part of one of those.

  4. #4
    Join Date
    Jun 1999
    Location
    Eastern Florida
    Posts
    3,877

    Re: Wikipedia parser

    Sorry, I have no more ideas.
    Norm

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured