Configuration as a ServicePosted on by Raymond Davies
Configuration as a Service: Moving Quickly
As a web scale tech organisation, it’s important that we can move quickly at scale within Skyscanner.
While we’re always tackling this – for example, by moving ever closer to continuous deployment, or by removing barriers which can slow teams down, like infrastructure provisioning (by moving towards AWS) – sometimes it would be nice to make changes without deploying any code at all.
This is where Configuration as a service comes in – the ability to change the behavior of our software systems on the fly without the need to make code changes. Recently the squad I work in released Skyscanner’s first iteration of Configuration as a Service.
Our main motivation behind the system was to enable anyone in the business to safely make changes to our production systems while having the changes backed by A/B tests and associated metrics & reporting. Another motivation was that the system allows us to gracefully bypass a service which is experiencing an unexpected problem. Having this flexibility means we can continue deliver the core experience which people come to Skyscanner for even if something goes wrong behind the scenes.
Letting anyone in the business make a production change… isn’t that dangerous?
I hear you and I guess the answer is, potentially yes. You aren’t the first to ask. This type of question was brought up more than once when we were originally pitching the idea to other squads.
To mitigate this issue, squads have total control over which aspects of their systems they expose via Configuration. They can also require changes to go through an approval process (with a comprehensive audit trail) which gives them an opportunity to preview changes before they go live. Finally, and as yet not implemented, we’ll be adding a progression for changes from our pre-production environments through to production, which gives an additional gateway for sanity checking.
It’s also worth bearing in mind that these changes are initially launched to the public behind an A/B test which has metrics and monitoring attached to it, so it’s actually quite difficult to really mess things up (at least without us noticing very quickly!)
Metrics & Monitoring
How do you know you are making changes which are positively impacting your user’s experience of your product? Or perhaps even more crucially, how do you know when something is broken?
These questions are being asked more and more of development teams as the ‘devops’ culture becomes increasingly popular. They’re important questions at any scale, but they become incredibly important when, like Skyscanner, you’re dealing with hundreds of engineers, in teams all over the world, releasing code at their own heartbeat. When you extend this ability to anyone within the organisation and remove the barrier of shipping a release, it becomes an absolute necessity.
Our Configuration service leverages Skyscanner’s internal experimentation platform ‘Dr Jekyll’, which provides us with an A/B testing framework and the automatic ramp up of successful changes.
We can then track how users in the A and B buckets behave using tools, such as Mixpanel. This is one of the places we monitor how people move through our product funnels and whether they exit to our partners or if they bounce at a certain point in the process. If you’ve just made a Configuration change to how our search controls work for example, and we see that users with your change seem to be having trouble finding the flight they’re looking for, we’ll probably review the change and assess the impact it’s having on users.
Similarly, it’s a requirement within Skyscanner that systems have monitoring and alerts configured against against agreed KPIs and machine level metrics. We use systems like Seyren, Graphite and VictorOps for monitoring and alerts. This means that we can quickly identify abnormal behavior, and ideally, pull the system using Configuration while any problem is rectified.
Squads are already coming forward with some really interesting use-cases for the system, including some we’d not even thought of when dreaming it up, which is awesome. One such idea would see us able to serve part of our flights funnel while removing all of the servers which currently host that part of the system. Absolutely not the use case we’d imagined using Configuration as a service for, but really interesting none the less.
While it’s early days for Configuration as a service at Skyscanner, I’m really excited about where it’ll take us and the interesting ideas that people, who might not otherwise have had the ability to make changes, will bring to the table.