The really weird thing is that it says it’s visiting 18 pages and that it’s scraping rcs.ncc-ccn.ca, www.youtube.com, assets.juicer.io, and 1 other domain. It should only be scraping the first one. Can anyone help me figure out what’s going on?
Not sure if this is happening for you but I very occasionally have a scraper say it is scraping 80 pages when it should only be pulling one. It tends to be when the server is struggling and the doesn’t do any harm - the next day things are fine.
Scrapers reporting that they have scraped pages they haven’t is a known bug: https://github.com/openaustralia/morph/issues/1078 - happens to mine all the time.
If your scraper runs locally, the fact that it is running for hours/failing may be related to the ongoing queuing and disk space issues discussed in My scraper stuck, what to do? and Scrapers failing with status code 128 and 255 rather than an issue with your scraper code.
Thanks! Do you think it could be the ongoing issues even though running a more basic test scraper worked during the times I was having these issues?
Hard to say, but looking at it now, it says it has been queued for 2 days. As I understand it, if a scraper has actually legitimately been running for 24 hours, it gets killed off so I think that means it is stuck. If you ask @henare nicely (or badger him every few days in my case) he will be able to kill it and you can see if you can get a successful run out of it. Fingers crossed…
@henare are you able to kill my scraper? Do you think the issue is that it’s stuck?
Just had a look at your scraper and now you’ve got a clean run, it is failing with
ImportError: No module named splinter
If you want to use third party libraries, you’ll need to add a
requirements.txt file to your repository with any dependencies your scraper needs to run. That will allow morph to pull in the dependencies you need before your scraper runs and that should sort your problem. Good luck.
Thanks. It’s because I’m running my scraper elsewhere now because I had so many problems. Previously had a requirements doc