CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 2 of 2
  1. #1
    Join Date
    Feb 2012

    Java - links from a specific part of a wikipedia article

    I am doing an NLP project and I need to know how to extract links that only are in the "introduction" section and in the "geography" section of this wikipidia page: http://en.wikipedia.org/wiki/Boston.

    I used jsoup to extract all links from all the page, but I am not able to do it only from the sections that I want (introduction and geography section).

    Could you please help me?

  2. #2
    Join Date
    Feb 2012

    Re: Java - links from a specific part of a wikipedia article

    This is my solution:

    package LinkIntroGeo;

    import org.jsoup.Jsoup;
    import org.jsoup.helper.Validate;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;

    import java.io.IOException;

    public class LinkIntroGeo {

    public static void main(String[] args) throws IOException {

    Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/New_England").get();

    Element intro = doc.body().select("p").first();
    while (intro.tagName().equals("p")) {
    //here you will get an Elements object which you can
    //iterate through to get the links in the intro
    intro = intro.nextElementSibling();

    for (Element h2 : doc.body().select("h2")) {
    if(h2.select("span").size() == 2) {
    if (h2.select("span").get(1).text().equals("Geography")) {
    Element nextsib = h2.nextElementSibling();
    while (nextsib != null) {
    if (nextsib.tagName().equals("p")) {
    //here you will get an Elements object which you
    //can iterate through to get the links in the
    //geography section
    nextsib = nextsib.nextElementSibling();
    } else if (nextsib.tagName().equals("h2")) {
    nextsib = null;
    } else {
    nextsib = nextsib.nextElementSibling();


    It works fine but not with all wikipedia pages!! For example it works with these url:


    but not with


    Any pieces of advice?


Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Windows Mobile Development Center

Click Here to Expand Forum to Full Width

On-Demand Webinars (sponsored)

We have made updates to our Privacy Policy to reflect the implementation of the General Data Protection Regulation.