On my local machine it’s working okay, but on Morph.io it doesn’t seem to be finding the data injected by JavaScript on the website in question. Could this be because I am using the most recent version of PhantomJS locally and Morph isn’t? (How do I get around this - do I put the executable in the repo and tell it to use that copy somehow? which version do I need - linux x86?)
The Legend at the top is hard coded and gives the class IDs for where the data is, but when Morph.io scrapes the page, the table cells containing the data which are populated by Javascript appear to be nonexistent or empty.
It looks like it takes a little while for the calendar to load the first time and I’m guessing this is why it’s failing. You can wait for it to finish loading by applying a no-small-dose of RSpec:
require 'rspec/expectations'
require 'capybara'
require 'capybara/poltergeist'
require 'capybara/rspec/matchers'
require 'scraperwiki'
class CalendarSearch
include RSpec::Matchers
include Capybara::RSpecMatchers
@@url = "http://hansardpublic.parliament.sa.gov.au/#/search/1"
def initialize
@session = Capybara::Session.new(:poltergeist)
@session.driver.browser.js_errors = false
end
def ready
@session.visit(@@url)
warn 'waiting...'
expect(@session).to(have_css('.scheduler .yearWrapper .k-widget', wait: 10))
warn 'all set!'
yield(@session)
end
end
CalendarSearch.new.ready do |session|
# do stuff with `session`
end
Or you can just sleep for a few secs at the beginning and hope for the best
As an aside, when using apt install PhantomJS on my Ubuntu 16.* VPS, it installed great but errored when run. Something about the repo version being compiled against Unity or another graphics library. Ended up using the compiled-from-source version.