Skip to content Skip to sidebar Skip to footer

Extracting "hidden" Html With Jsoup

I am trying to get at HTML data that does not appear in the source document but can be exposed, for example, by 'inspect element' in Google Chrome. Example page: http://assignmen

Solution 1:

The data seems to loaded with AJAX. JSoup does not process Javascript.

What you need is a "headless browser" API, that processes Javascript without actually rendering anything.

HtmlUnit seems to be the best known tool, although I've never used it myself. As suggested before, Selenium Webdriver is also an option.

I believe you will have to load the URL, wait for all the AJAX to process, and you will eventually get almost the same parse tree you get in Chrome in Java to do with it as you wish!

Solution 2:

Post a Comment for "Extracting "hidden" Html With Jsoup"