Why is my scraper being killed?

This scraper has just started to get killed https://morph.io/planningalerts-scrapers/north_sydney

/start: line 24: 11 Killed setuidgid scraper $(eval echo ${command})

I’m not sure why though.

Not 100%, but that 11 is the signal it was killed under, isn’t it? I think this one is SEGV. Not sure how that’d be happening though. o_O

This is my (somewhat limited) understanding - on https://morph.io/planningalerts-scrapers/north_sydney it says “Last run failed about 4 hours ago with status code 137.” exit codes as a result of a signal are (128 + signal) which must 9 which is SIGKILL.

So, the process was killed with SIGKILL which points pretty strongly to it getting killed because there wasn’t enough memory. It only was running for a couple of minutes so it definitely wasn’t killed because it ran too long

Interesting, thanks!

Odd that it’s just started to happen :-/

Arg! I missed the big, red error message with the actual status code in it. Yep, 137 it is…

I seem to be hitting this same issue. Is there any guidance around how much memory is available? I wouldn’t have thought my scraper was particularly resource intensive, but neither have I tried very hard to optimise it for memory use.

It’s 100 MB.

Node.js, which I see you’re using, seems to be particularly memory hungry. You might be interested in this pull request that was recently merged.

I’ve not used Node.js much but that PR suggests to me you might want to manually trigger garbage collection in your scraper.

You’re right, node is pretty memory heavy—just as a general rule. My understanding is that it’s deliberately greedy, though I’m not sure what the idea behind that would be.

And that did the trick, thanks.

Awesome! Here’s @drzax’s commit for anyone else looking to fix this issue.

Hm, I’m facing the same issue on a scraper trying to parse a 13MB XML file: https://morph.io/pudo/expert-groups-scraper – any advice?

Hmm, sorry I missed your question @pudo. I think we may have email problems from the forum again :frowning:

It looks like you solved it already and your commit message mentions a streaming parser - this is what I would’ve suggested too.

Still, I wonder if you might reconsider this resource limit. I’ve been spending hours and hours trying to get scripts to run on morph now and arbitrary kills like this just make it deeply unpleasant to do.

At this point it would probably be much more efficient for me to just set up my own scrape server and accept that I’ll have to spend a bit of time on sysadmin every now and then.

Please don’t make morph.io as focussed on toy scrapers as ScraperWiki was, I believe it’s one of the reasons that tool never became a core part of anyone’s workflow.

Thanks for this @pudo. Could you raise this as an issue on github so we can all discuss this on there? Seems like a distinct issue, “Raise resource allocation” or something, separate from this thread.

1 Like

The memory limit has just been increased by more than 5x - it’s now 512MB per scraper :tada:

1 Like