My set of source code is here on https://github.com/LoveMyData/city_of_ryde and it is already working on morph.io but I want to get multi pages scraping working.
Any assistance is appreicated.
PS: I am a hobbyist programmer, ideally, copy and paste set of code is the best if anyone in this community have it.
I just wanted to add to this a little because it’s a common problem when scraping gov’t sites.
Here’s an example where I’ve used post requests to deal with paging on one of these ugly sites. It uses NodeJS.
The tricky was parsing the state data out of the form and the mimicking the behavour of
__doPostBack function before sending the request. In my case, it also seemed to be necessary to send additional HTTP headers to make the server behave as expected. I’m not sure which ones were ultimately necessary, but I just copied the headers from Chrome dev tool and added them to each request.