As every active user of morph.io knows, we've had our fair share of stability issues for some time now. We know what a big impact this can have on your use of morph.io. After all, we're one of the biggest users ourselves - for our most popular project, PlanningAlerts.
Not only that but all the fire-fighting puts a big strain on our tiny team (we're a charity, with just 2 full-time staff). Especially when we're busy working on our other major projects. So while we would love to devote heaps of time to the project and get things humming along, we often have to make the hard choice to patch things up and move on for a while.
You might remember that several months back we were stoked to have some time and resources to pour into improving things. In this post I want to make sure that you all know where that's at and what we have planned (and don't have planned).
Recent stability work
The work that was done several months back significantly changed the way the backend scraper run queue works. This has made several aspects of how the server runs much more stable. Disk space and memory blow-outs have been rare or non-existent which means the site has experienced very few outages. Importantly it's also meant we haven't had to tend to these issues regularly like we were for the previous 12 months or so.
Unfortunately several new issues have been introduced that have been affecting the running of your scrapers.
Outstanding major issues
You run your scraper. It scrapes stuff. It completes successfully. But there's no changes to the database - WTF?
This has been an issue since some changes made in April. We've spent quite a while trying to work out what the problem is but it's intermittent and we can't reproduce it locally. For a while it seemed to happen every few days then last month it didn't happen at all. Now it seems to be back.
We know that restarting the queue makes things work again but we still don't know what the problem is or how to fix it. For the moment this means other issues take priority and we'll just restart things when we see a problem or you report it.
Scrapers taking ages to run
We've had backlogs before but over the last month there's been some really huge ones. Like taking a day or more for your scraper to run. Or it seems queued forever.
This issue is actually a combination of how the queue backend was changed and a bug that's been affecting run containers.
Every few days over the last month we've had to spend an hour or two manually trying to get things working again when this issue has cropped up. This fire-fighting, and the issue itself, is really affecting our work. We're going to spend a couple of days working on this from today.
What we have planned
The first thing I want to do is get all our packages and gems up to date. It's very possible that a regression in a package or gem has contributed to the issues we're having now.
Changing the way that used slots is calculated. A big part of the delays running scrapers from the user perspective is potentially avoidable. morph.io thinks all the available slots are used because a bunch of runs are having problems (but they're not actually running!). I anticipate that this will be one of those changes that is not theoretically correct or the right thing to do long term but practically it could help massively and at least get us out of some pickle, preserve or other jam related substance.
And finally, if I have time, I might try and debug the root cause a bit more. I've had a small look and I have a very bad feeling that a huge part of the problem is an unresolved Docker issue.
What can we do to help you?
Obviously fixing the root cause of these stability problems would be the best thing to help us all. However it's very unlikely we have the people available to do that for at least another couple of months.
So, in the mean time, what would help you? The idea of this post is to be as clear and transparent as possible so you know what's going on and can plan accordingly. If there are other small things we can do then we're keen to hear any ideas you have.
What can you do to help?
As you know, morph.io is an open source project run to benefit the global civic tech community. Anyone can contribute and we're always delighted to get contributions. Despite being a tiny team we try hard to give contributions the time they deserve and help get them deployed.
morph.io is an important bit of shared civic infrastructure. We'd love this to extend to the project's maintenance and support one day too. So if you rely on morph.io and want to contribute in a bigger way then let's talk.
Thank you to all the patient morph.io users that have helped by reporting problems over the last few months, including @wfdd @tmtmtmtm @achasik @LoveMyData @chris48s @lowndsy @pietrobenito @drwillems @IvanGoncharov @noelmas @JasonThomasData @masterofpun @loren @randomgitman @ErinClark @arasabbasi @woodbine @jennyj5.
And thanks as always to @equivalentideas for helping with the support and for reviewing this post