Skip to content Skip to sidebar Skip to footer

How Do I Parse An HTML Document With JSoup To Get A List Of Links?

I am trying to parse http://www.craigslist.org/about/sites to build a set of text/links to load a program dynamically with this information. So far I have done this: Document doc =

Solution 1:

The <ul> containing the cities is the next sibling of the <div class="state_delimiter">. You can use Element#nextElementSibling() to grab it from that div on. Here's a kickoff example:

Document document = Jsoup.connect("http://www.craigslist.org/about/sites").get();
Elements countries = document.select("div.colmask");

for (Element country : countries) {
    System.out.println("Country: " + country.select("h1.continent_header").text());
    Elements states = country.select("div.state_delimiter");

    for (Element state : states) {
        System.out.println("\tState: " + state.text());
        Elements cities = state.nextElementSibling().select("li");

        for (Element city : cities) {
            System.out.println("\t\tCity: " + city.text());
        }
    }
}

The doc.select("div.state_delimiter,ul") doesn't do what you want. It returns all <div class="state_delimiter"> and <ul> elements of the document. Manually parsing it by string functions makes no sense if you've already a HTML parser at hands.


Post a Comment for "How Do I Parse An HTML Document With JSoup To Get A List Of Links?"