Get help with morph.io and scraping

Chrome headless scrapers appear broken


#1

New ticket for this question.


A scraper of mine seems to have broken at the same time as this update :frowning:

It fails with a:

selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: crashed
(Driver info: chromedriver=2.37.544315 (730aa6a5fdba159ac9f4c1e8cbc59bf1b5ce12b7),platform=Linux 4.17.17-x86_64-linode116 x86_64)

It’s using chromedriver in python, as described here:
https://morph.io/documentation/scraping_javascript_sites

The examples on the docs page above also seem to have broken at the same time, e.g.:
https://morph.io/wfdd/inatsisartut-scraper/history

I just tried re-running the really basic example, and that broke in the same way:
https://morph.io/andylolz/example_ruby_chrome_headless_scraper

So all this seems to suggest it’s related to this update. What can be done to get this working again? Thanks!


Morph.io Planned Maintenance - 9th Sept 2018, 12pm-7pm
#2

I’ve opened #1201 about this issue


#3

@andylolz Just a note to let you know I’m still working on this.

Are you able to point me at the particular error you mentioned above? I’m seeing a different error on https://morph.io/andylolz/example_ruby_chrome_headless_scraper, and https://morph.io/wfdd/inatsisartut-scraper/history seems to be mostly working (but my fork of it fails with a different error)

/app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/protocol.rb:181:in `rbuf_fill': Net::ReadTimeout (Net::ReadTimeout)
 	from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/protocol.rb:157:in `readuntil'
 	from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/protocol.rb:167:in `readline'
 	from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/http/response.rb:40:in `read_status_line'

I can reproduce that error in my test environment, but it’s not the error you originally reported. I want to check that I’m looking at the right problem.


#4

Hi @jamezpolley,

Thanks for your work on this!

I think the reason wfdd/inatsisartut-scraper is now (more or less) working is because it was updated to use phantomjs (presumably because of the problem with using chrome on morph.)

The error on andylolz/example_ruby_chrome_headless_scraper is the one I’m having trouble with:

/app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/response.rb:69:in `assert_ok': unknown error: Chrome failed to start: crashed (Selenium::WebDriver::Error::UnknownError)
   (Driver info: chromedriver=2.37.544315 (730aa6a5fdba159ac9f4c1e8cbc59bf1b5ce12b7),platform=Linux 4.18.16-x86_64-linode118 x86_64)
 	from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/response.rb:32:in `initialize'
 	from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/http/common.rb:81:in `new'
 	from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/http/common.rb:81:in `create_response'
 	from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/http/default.rb:104:in `request'
 	from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/http/common.rb:59:in `call'
 	from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/bridge.rb:164:in `execute'
 	from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/bridge.rb:97:in `create_session'
 	from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/bridge.rb:53:in `handshake'
 	from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/chrome/driver.rb:47:in `initialize'
 	from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/common/driver.rb:44:in `new'
 	from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/common/driver.rb:44:in `for'
 	from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver.rb:85:in `for'
 	from /app/vendor/bundle/ruby/2.5.0/gems/capybara-2.18.0/lib/capybara/selenium/driver.rb:23:in `browser'
 	from /app/vendor/bundle/ruby/2.5.0/gems/capybara-2.18.0/lib/capybara/selenium/driver.rb:49:in `visit'
 	from /app/vendor/bundle/ruby/2.5.0/gems/capybara-2.18.0/lib/capybara/session.rb:274:in `visit'
 	from scraper.rb:8:in `<main>'