Hi,
I can’t figure out how to avoid the following SSL errors when scraping a https web site using morph.io (whereas the exact same scraper code works successfully on my Windows 10 PC when run using VSCode):
write EPROTO 140366154426240:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:../deps/openssl/openssl/ssl/s23_clnt.c:827:
and (when secureProtocol is set to TLSv1_method, TLSv1_1_method or TLSv1_2_method):
write EPROTO 140587175036800:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:../deps/openssl/openssl/ssl/s3_pkt.c:365:
This is my scraper:
https://morph.io/MichaelBone/city_of_burnside_sa_development_applications
And this is the web site (which I’m fairly sure is TLS 1.0, 1.1 or 1.2 because IE 11 rejects it if I don’t enable a TLS version in its options):
I’ve tried “agentOptions: { secureProtocol: "TLSv1_2_method" }
” and just about every other TLS and SSL method including SSLv23_method
.
I’ve tried re-writing it in Ruby and using chrome headless (and phantomjs): see https://morph.io/MichaelBone/city_of_burnside_south_australia_development_applications_test (based on the examples at https://morph.io/documentation/scraping_javascript_sites).
I’ve tried “--ignore-ssl-errors=yes” and “--ssl-protocol=any” when using phantomjs.
I’ve tried “process.env.NODE_TLS_REJECT_UNAUTHORIZED = '0'
”.
I’ve tried different versions of node.js (including 10.6.0).
I’ve experimented with introducing certificates using ssl-root-cas. I’ve experimented with “verify_mode: false
”, “use_ssl = false
” and “verify_mode = OpenSSL::SSL::VERIFY_NONE
”.
In all cases I’m either getting an SSL error or a “blank” page (and by a “blank” page I mean just “<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body></body></html>
” from capybara instead of the fairly extensive HTML of the actual web site).
I’m beginning to wonder if the mitmproxy is causing the issue (maybe similar to https://github.com/openaustralia/morph/issues/1135 or maybe https://github.com/openaustralia/buildstep is relevant). Perhaps somehow bypassing or turning off the mitmproxy would help. Or maybe the https site has SSL configured in an unusual way.
Any ideas? Can you help?
thanks,
Michael