Get Data From A Website
Solution 1:
I've had success using Selenium to scrape sites that use a lot of javascript. If it shows up in a browser, you can get it with Selenium. It's Java but there are bindings to drive it from your favorite scripting language; I use Python.
You may also want to look into headless browsers like Crowbar and PhantomJS. The thing I like about selenium is that being able to watch it drive the browser helps my debugging. Also there is a Firefox plugin (the IDE) that can generate some basic code to get you started... you just click along and it'll record what you've done (that code will definitely always need massaging/massive editing, but it's helpful while you're learning how to do this).
Note that this is a surprisingly hard thing to do. Especially on a large scale. Websites are messy, they are different from one another, and they change over time. This makes scraping either infuriating or a fun challenge, depending on your attitude.
Post a Comment for "Get Data From A Website"