-
Wikipedia parser
After reading the latest xkcd: http://xkcd.com/903/ I started writing a program that will find the first link on each article, and see if it does eventually lead to philosophy. However, I know next to no HTML, so I don't know how I would find out what the first link is. I know that it should look like <a href something something> but there's many links before that that are not what a user would consider the first link. Does anyone have any ideas on how to do this?
(Also, I wasn't sure what forum this should go on, since this isn't really Java-specific, and I already know the syntax I'd use.)
-
Re: Wikipedia parser
Write a program to extract all the links and print them out. Then look at what is printed out to see which link you want to find and then change your program to find that link.
-
Re: Wikipedia parser
I have, but I can't find a pattern that would work. The problem seems to be that infoboxes, tags, and pictures show up before the real body of the text, and I can't figure out how to tell if a link is part of one of those.
-
Re: Wikipedia parser
Sorry, I have no more ideas.