I switched my scraper from using just Selenium to Splinter so I could scrape a site using JavaScript. It’s working locally and saving to the database. But when I run it on Morph it tries to run for (literally) hours if I don’t stop it and it then eventually fails.
The really weird thing is that it says it’s visiting 18 pages and that it’s scraping rcs.ncc-ccn.ca, www.youtube.com, assets.juicer.io, and 1 other domain. It should only be scraping the first one. Can anyone help me figure out what’s going on?
Not sure if this is happening for you but I very occasionally have a scraper say it is scraping 80 pages when it should only be pulling one. It tends to be when the server is struggling and the doesn’t do any harm - the next day things are fine.
Hard to say, but looking at it now, it says it has been queued for 2 days. As I understand it, if a scraper has actually legitimately been running for 24 hours, it gets killed off so I think that means it is stuck. If you ask @henare nicely (or badger him every few days in my case) he will be able to kill it and you can see if you can get a successful run out of it. Fingers crossed…
Just had a look at your scraper and now you’ve got a clean run, it is failing with ImportError: No module named splinter
If you want to use third party libraries, you’ll need to add a requirements.txt file to your repository with any dependencies your scraper needs to run. That will allow morph to pull in the dependencies you need before your scraper runs and that should sort your problem. Good luck.