Get help with morph.io and scraping

Getting 502 Bad Gateway error using Mechanize on a scraper

Hi there,

Just recently started playing around with Morph again as QuickCode is shutting down. I’ve had a scraper working for quite a while on QuickCode (Scraperwiki) scraping course registration data. It’s working fine in QuickCode/Scraperwiki but on Morph.io I get a 502 Bad Gateway Error:

Traceback (most recent call last):
File “scraper.py”, line 47, in
response = br.open(url, timeout=60)
File “/app/.heroku/python/lib/python2.7/site-packages/mechanize/_mechanize.py”, line 203, in open
return self._mech_open(url, data, timeout=timeout)
File “/app/.heroku/python/lib/python2.7/site-packages/mechanize/_mechanize.py”, line 255, in _mech_open
raise response
mechanize._response.httperror_seek_wrapper: HTTP Error 502: Bad Gateway

On Twitter, someone from Morph.io said it looked like a bad certificate. And, indeed, when I go to this site in Chrome I get a warning that says the certificate is invalid, so that sounds right.

But I don’t really care about the security vulnerability here. I just want to scrape the data. Is there some way I could just get mechanize to ignore the security certificate and plow on anyway?

Here’s the scraper:

Thanks!

Chad

Think I figured it out: The URL of the university changed from kwantlen.ca to kpu.ca

The old kwantlen.ca course site was still live but that broke the certificate because now the security certificate was coming from kpu.ca

When I changed the URL in the scraper to kpu.ca (the new course site), it started working!

Chad

1 Like

At the time, the answer was “Not easily; we funnel web traffic through a proxy which validates the certificates and you can’t override that”.

However, because of this issue (and a few other things that came up at the same time) we’ve disabled that proxy, so the answer now is “Yes, it should be fairly easy”. It looks like https://www.python.org/dev/peps/pep-0476/#opting-out is still relevant, I think, but I haven’t tested this.

1 Like