Get help with morph.io and scraping

Alternative to PhantomJS

Hello,

I’m just looking for a Pythonic alternative to PhantomJS.
By way of background, my scripts are running on a VPS, as I need them to run every hour.

So, I set up a virtualenv and installed the NPM. Installed PhantomJS in the virtualenv.
However, it returns that GhostDriver does not work with port number XXXX I’ve tried a bunch of ports, manually opened them first in terminal, no good.

I’m aware that GhostDriver is native to Python, but I don’t think it’s been maintained in years, and I suspect that’s the issue with me using PhantomJS.

I’d consider using Node, but my Python script is 200+ lines and I don’t want to rewrite this. Also, I’ve read that Node’s Python shell is not perfect so I don’t really want to go down that road.

So I’m at my wit’s end with this, and wondering if there’s a package that can do what PhantomJS does, without me using the NPM?

Ended up finding my own answer.
This is a good list it seems.

The list is actually really good, I’m going to try Spynner and then Splash and let you know how those go.

1 Like

That is an awesome list, nice find! I’ll be interested to hear what you end up going with.

Good news guys, spynner works for me. It’s a pain to set up, but here are the steps:
This solution won’t help you with Morph, but if you’re running an Ubuntu server, this works.

This explanation is a ramble, but I wanted to get my thoughts down somewhere.


Set up Spynner’s dependencies - qt4 (which has it’s own dependencies).
If you’re using a linux distro, do this http://www.guguncube.com/2733/python-spynner-installation-in-ubuntu
I ignored the quick install and did one and two of ‘step by step’.

If you get this error when installing the packages error: Setup script exited with error: command 'x86_64-linux-gnu-gcc' failed with exit status 1, then you need to install sudo apt-get install python-dev

To get qt4 working on a vps, you’re going to need X11 forwarding.
I found X11 the hardest part to get working, so here are some informative articles.
http://xmodulo.com/how-to-enable-x11-forwarding-using-ssh.html


In those explainers is details about Forwarding.
You need to add these lines this file /etc/ssh/sshd_config, if you’re logging into the VPS via ssh, and want to see what’s going on.

X11Forwarding yes X11UseForwarding yes X11DisplayOffset 10

If you’re running your script while not logged into SSH, you’ll need to set X11DisplayOffset to 0

If you’re using Linux and testing a spynner script on a VPS via SSH, you need to login as ssh -X user@ip_address. If you don’t use -X you’ll get, cannot connect to X server when you try to run a script that contains spynner.

To run the spynner script as a cron job, when you’re not logged into the VPS, you’ll need to add first give authority to the user’s account that is running the job, for it to open the X server.
You should make sure that Xvfb is installed.
Also, install xvfbwrapper, and refer to the basic usage example for putting your code inside the X server.
You can make your python script executable - pay attention to line endings.

Here is my cron job.

29 17 * * * export DISPLAY=:0 XAUTHORITY=/home/vp_server/.Xauthority && /home/vp_server/WebWatcher/WebWatcher.py

But if I was running this job while logged into the SSH, and wanted to see it run, I’d need to have DISPLAY=:10 and the Forwarding options turned on.

2 Likes

Hey, thought I’d update you guys on this.
Spynner really did not work out.
Having x11 running for every request was just not realistic.
I revisited PhantomJS, since it does not require x11.
PhantomJS now works, so that’s now the solution. It was probably the obvious way to go, but glad I tried Spynner out.

1 Like