I have a challenge for someone on here, because I can't seem to figure this out.
I am trying to write a program that takes a particular topic page on wikipedia and, using only links available on that topic page for navigations, recursively finds the quickest way from that topic page to a specified other topic page.
If i wanted to get from the wiki page on "apples", to the wiki page on"bananas", I would need to click a link on apples and get to another wiki page on a different topic. From that page, i would click a link and get to another topic. My end goal would be to get the "bananas" page.
I have attempted this, and my code only hits the first link on every page and im not sure how to stop it from moving further (i.e. I have no idea what my base case is/could be).
Anyone who thinks they can write this and/or help me figure out what I'm doing wrong, please respond
Ok, let me see if I get this straight. Lets say page 1 has n links on it, to page 1.1, 1.2... 1.n. (1.1 being the first link on the page). Then 1.1 has m links, 1.1.1, 1.1.2, ..., 1.1.m and so on.
Your program only clicks 1, then 1.1, then 1.1.1, then 220.127.116.11 and unless the page you are looking for can be accessed by a path of the form 18.104.22.168 ... .1, your code never returns?
This is basically a search in a tree. Pretty common stuff in artificial intelligence. There are multiple ways to do this, but the simplest way (although not necessarily the most efficient way) would be to use breath first search: http://en.wikipedia.org/wiki/Breath_first_search.
Basically, you have a closed set and a open queue. At the start, you put the starting page in the start queue. Then your algorithm takes the first element of the queue, check if it's the end page. If it isn't, it puts it tries to put it in the closed set. If it is already in the closed set, this means you already visited this page, so there's no need to do anything. If it isn't already in the closed set, then add ALL the links in that page to the end of your open queue, then start the loop again.
So at the beginning,
After one loop (assuming there are only 2 links):
Open: 1.1, 1.2
After 2 loops (assuming 1.1 has 3 links)
Open: 1.2, 1.1.1, 1.1.2, 1.1.3
Closed: 1, 1.1
As you can see, you will visit 1.2 before you start visiting the links in page 1.1.
This algorithm has three main advantages.
1) if there is a path, it will find it eventually.
2) the first path you find is always the shortest path. (or tied for the shortest path)
3) if there is no path, and if the number of pages is finite, the algorithm will eventually terminate and tell you that there is no path EVEN if there are looping paths (because you check if the page was already visited before processing it).
I'll leave the actual implementation to you. (It doesn't really need to be recursive.)
Last edited by Filobel; April 1st, 2011 at 12:52 PM.