Off-Cloud Backup for Heroku apps – a possible answer

The Heroku platform is an absolutely fantastic way to have to not bother with devops within a small development company. We’ve been using it at interconnect for years now, and whilst it’s not entirely perfect, it takes away one set of headaches and does so at a reasonable cost.

All the services offer backups, and the VMs are built from scripts and are essentially read only. So if something catastrophic happened to one of our databases, we can roll back a day and be OK. Except… let me explain my fears around data.

Trust issues with providers

In our very earliest years we used a VPS provider that used Plesk. Everything was solid and stable until one day, we got a report that a site had been hacked. Then another. It turned out that a vulnerability had exposed our sites to being hacked. And they were. This resulted in a big old clean up operation and restoration from backups. Except the daily backups we’d been paying for turned out to be weekly. So the backups we had were three days old. Ever since then, I’ve preferred to have a way of pulling backups separately to a server under my own control, unless the provider is Kumina, because I know the people so well that I’m 100% certain they’re as paranoid as I am and they’ve never ever let me down. But in the era of hustle culture bros who move fast and break things, you need a safety net.

Creeping corruption

My next fear is corruption you don’t notice immediately. I can well imagine that if all the meta data for the posts on a site before a certain date got wiped out, most people wouldn’t notice for ages. Imagine you’ve got a site with 200,000 posts, and various elements of the first 100,00 were damaged – the long tail matters to these sites and suddenly it’s all gone. Well, thank heavens for backups!

Except, of course, most cloud providers don’t provider substantial generational backups. Instead, they keep a few days or a week or so. And that’s your lot. If you need to go back months you’d better hope a developer in the company left a dump on their laptop somewhere – except of course that very very few developers keep dumps of production systems on their laptops – it’s bad practice and only tends to happen in exceptional circumstances and should be deleted soon after use.

How we fix it today

In the end, I asked one of my Linux oriented colleagues, Gianluigi, to create a service that would connect to Heroku’s API and then download every database, and sync every S3 bucket. It worked, with some limitations. More recently, because he’d left but remains a good friend, he helped me with a crash course in Linux sysadmin basics and I was able to extend and improve some bits. The system is a service written in PHP that does all the work. I then asked another colleague internally, Jack, to extend things to cover the PostgreSQL databases we also now used and to create a dashboard so that I could monitor the backups easily without resorting to logging into the backups servers.

The dashboard also doesn’t run on the backups servers. I needed to keep the backups as safe as possible – they’d be a great honeypot for a hacker, so they’re onioned away, and the backups service isn’t reachable from outside. Instead, it messages the dashboard with information about the backups taken. The dashboard also provides details on application and framework versions, for security monitoring and making sure updates have been applied appropriately, and it also sends me a daily summary email showing me storage space available and what was backed up in the previous 24 hours.

Here are a few screenshots of the system, with some censoring, but I hope you catch how it works from what you see.

To commercialise, or not?

And now to one of the reasons why I’ve decided to write about this. In the past, I created the first version of Search Replace DB – a quick script and algorithm I knocked up to parse a database and search and replace items in it. A fast, dangerous tool that I released as free open source code. Other people took it and commercialised it into successful products. We didn’t. And with the code being integrated into wp-cli and most devs would use that in preference (myself included!), except in those tricky situations where command line access wasn’t possible – mostly on cheap hosts. I think we were right to release the code, but where we failed was in realising the commercial possibilities. And that’s left me a little torn.

So now I’m torn – it’s not easy to set up services in Linux, but once you do, these things just run and run. It’s also not going to be the easiest thing to work with, so I anticipate support costs being quite high. It’s proper server level work. And I certainly don’t feel inclined to build a SaaS that acts as a conduit for people’s backups. It’s just too risky to have a central pool of lots and lots of backups, and people find them lurking on S3 buckets all the time. So I want to put this out to the community. Is this something you’d find useful? Let us know in the comments below. If we did release it, the code would be open source, but access to the latest versions would be restricted.

I’m looking forward to hearing your thoughts!

Pros & Cons of Software as a Service

SaaS rocks, but it’s not the be all and end all. But it is hot right now (again) and will continue to be so for quite some time. However, Rich Internet Applications (RIAs) driven by the likes of Flex and Silverlight could be returning us to a Client Server paradigm and is probably the way forward – bringing SaaS and local processing benefits to users.

Google Docs - still Beta
Google Docs - still Beta
I just responded on the UK Business Forums to a question about SaaS and felt that it was worth re-posting the content here:

As someone who’s worked on web delivered applications for around eight years, and on client/server and mainframe (which is actually like very thin client) I feel reasonably qualified to point out a few pros and cons of SaaS:

Pros

  1. Deployment – in either corporate or SME environments this could be a nightmare. MS have it pretty well sussed these days, however, while Apple don’t. Which possibly explains why a lot of key SaaS proponents are Apple OSX users.
  2. Cashflow – short term costs are easier to manage.
  3. You’re always up to date – updates are continuous and rolling.
  4. Data is stored and managed by professionals who hopefully know what they’re doing with regards to security and integrity.
  5. Harder to lose data when a laptop goes missing.

Cons

  1. Performance & Productivity – it’s definitely worse for end users although many will argue it’s fine. It’s getting better, but it’s still worse – especially for expert users. I remember how quickly expert PeopleSoft users could input data.
  2. Because of 1, designers do have to concentrate hard on slick usability, but that means flexibility has a habit of dropping.
  3. If you’re offline it’s a royal pain – methods of getting around this are improving, somewhat, and MS again appear to have some very good tech coming in Windows 7 to make working this way more feasible, but for the moment it’s still a major weakness in all but a few sophisticated cases.
  4. You can be very much locked-in to a provider, far more easily than with local software, and there’s very few SaaS services that are open source. Even fewer where the export/import tools are 100% seamless. Even switching from WordPress.com to self-hosted can be painful for some.
  5. You have to trust your provider and hope they don’t mess up. MS did a contacts screw up with Hotmail a while ago during an upgrade which affected a small proportion of their users – they kept e-mail addresses, but lost names and other information. And when something is ‘free’, especially, it’s very hard to kick up a stink. In fact, it’s worth noting that a lot of these services have an ‘at your own risk’ policy in their licences. To be safe you should be backing up your online data somewhere offline, which is painful.

SaaS rocks, but it’s not the be all and end all. But it is hot right now (again) and will continue to be so for quite some time. However, Rich Internet Applications (RIAs) driven by the likes of Flex and Silverlight could be returning us to a Client Server paradigm and is probably the way forward – bringing SaaS and local processing benefits to users.

Another thing I’ll add, which I didn’t bother mentioning on the forum, is that you’ll never get the flexibility of a local application with SaaS. The power and bandwidth simply isn’t there, and never will be. My computer can transmit data internally at multiple Gigabit speeds. By the time that kind of broadband exists at a low price for mass adoption most computers will be transmitting data internally at Terabit speeds. For the ultimate in performance you’ll always want local applications. For connected, multi-service and convenient applications you’ll want to go over to SaaS. Both will co-exist, just like Client Server failed to kill off the mainframe.