Containers are a hot topic. Most companies are adopting or evaluating the technology – Docker in particular – to speed up application deployment, drive down cost, ease management and make application delivery more flexible overall.
As with most new architectures, this dream takes a bunch of work to become a reality. Even when you do get your application componentized enough and packaged properly, there are still challenges for DevOps teams to making the shift to continuous delivery and achieving that reduction in cost and increase in speed.
When deploying our app, VCTR (sounds like Victor), we discovered a few of these challenges. Here are our top 5 challenges when trying to move to container deployment for VCTR and how we solved them:
1. Unruly Scripts cost time and developers
Building out the script to deploy our complex, multi-container application was a job in itself. We had to assign a full-stack developer who understood the full architecture and pull him away from building actual product features. Our now “full-time overpaid script guru” spent hours to days updating the script every time we had a new feature to deploy. From a simple 200 lines of code to over 2,000 lines of chaos, the script grew and our productivity lagged.
2. Upgrades are hit or miss
Upgrading our app was a shot in the dark. From small patches to new features, though we tested all permutations of functionality, we had no way to test out the deployment itself. We’d get hung up by silly stuff like “we forgot to purge the db connection” or “we needed to deploy container 1 before destroying container 2”. We attempted to fail forward but ended up spending tons of time debugging instead.
3. Redeploy is not a solution.
In a fully available application, downtime is not an option. The deploy and pray methodology didn’t do much for our nerves (or our careers) either. So, our script guru did his very best to “Fail Forward” by just redeploying and hoping for a fix. That worked sometimes but never allowed us to actually pinpoint and fix the underlying problems.
4. Manual Rollbacks != Fun
When the not-so-trusty redeploy failed, and fail-forward took too long, script guru had to manually roll-back our changes. Imagine the wonderful feeling of removing what he deployed piece by piece while having the team breathing down his neck asking for an estimated time for a ‘fix’. Even when he did get it working…
5. No one knew what we deployed, except our script guru.
The development team is making updates based on a supposed end-state they think exists. The Ops team is sending in reports of issues but the dev team says it “works for them”. The cycle continues until someone opens the access to production to pinpoint the problem. Not very easy, flexible or reliable, is it? Oh, and our newly minted script guru has to be around for every deploy. Probably great for his job security but terrible on his personal life. HE CAN NEVER LEAVE..
We realized if we couldn’t solve these problems, there was no path to true continuous deployment. Evidently we were discovering what some tech giants had already realized – they needed a deployment system. That’s why Netflix developed Spinnaker, but it’s designed for a previous generation of infrastructure. That’s why Uber developed Micro Deploy, but they haven’t shared it so you can’t run that.
So we built Skopos, what our script guru calls “continuous deployment for the rest of us”.
We’re in the beta phase with Skopos, so you can check out the full blown product without a registration: https://github.com/datagridsys/skopos-sample-app. We know it solves all of our initial problems with deployment and probably will do the same for you.
To conclude our story, script guru (aka Pavel) is back to building features.