How can I scrape sites that require authentication using node.js?


I've come across many tutorials explaining how to scrape public websites that don't require authentication/login, using node.js.

Can somebody explain how to scrape sites that require login using node.js?

6/22/2015 2:32:26 PM

Accepted Answer

Use Mikeal's Request library, you need to enable cookies support like this:

var request = request.defaults({jar: true})

So you first should create a username on that site (manually) and pass the username and the password as params when making the POST request to that site. After that the server will respond with a cookie which Request will remember, so you will be able to access the pages that require you to be logged into that site.

Note: this approach doesn't work if something like reCaptcha is used on the login page.

4/26/2016 11:42:01 AM

Or using superagent:

var superagent = require('superagent')
var agent = superagent.agent();

agent is then a persistent browser, which will handle getting and setting cookies, referers, etc. Just agent.get, as normal.

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow