Skip to content Skip to sidebar Skip to footer

How Can I Parse Remote Html Page Using Pure Java Script

I have a requirement to Parse remote html page ( ex: www.mywesite.com/home) how can i get this website html page source and how can i parse this page that html is like this

Solution 1:

Ordinary browser javascript cannot access the contents of remote pages from any server except its own.

You can:

  1. Have a cooperating script on your own server to fetch the remote content

  2. With the cooperation of the remote server, you may be able to access content with an appropriate CORS ( http://en.wikipedia.org/wiki/Cross-origin_resource_sharing ) arrangement.

  3. Again with the cooperation of the remote server, if it makes its content available by javascript you can access that by creating inline script elements. "JSONP" is an example of this approach.

  4. If you write a browser plugin or addon - for browsers which permit such things to be written in javascript - then you are not bound by the browser security model in the same way.

Solution 2:

assuming origin fixed etc, here is the approach I use:

// get body part of html
txt = txt.substr( txt.indexOf('<body>')+6 );
txt = txt.substr( 0, txt.indexof('&lt/body&gt')-1 );

// stick body into div
var div = document.createElement('div');
div.innerHTML = txt;

// extract textContent from each element (or something more interesting)
Array.prototype.slice( div.querySelectorAll('*') ).forEach( function(el) {
   if( el.textContent ) console.log( el.textContent );
});

Post a Comment for "How Can I Parse Remote Html Page Using Pure Java Script"