Code Voyagers Podcast: Skyscanner Security Squad

Posted on by Richard Davidson

Stu Hirst, Skyscanner speaking at Cloud Expo. Pic by: SC Magazine
Stu Hirst, Skyscanner speaking at Cloud Expo. Pic by: SC Magazine

This episode we talk to Stu and Oskar from our Security team about the work they do and trends they have seen recently in internet economy companies like Skyscanner.

Topics include the OWASP Top 10 and some unusual Internet of Things hacks.

As ever, please let us know what you think and how we could make this better. What type of topics would you like to hear? We’d love to hear your views, either by commenting below or tweeting us @CodeVoyagers  or in the comments below.

Like what you hear? Work with us

We do things differently at Skyscanner and we’re on the lookout for more Engineering Tribe Members across our global offices.

If those vacancies have closed by the time you read this then take a look at our Skyscanner Jobs for more vacancies or sign up to our Code Voyagers mailing list for more case-studies and job roles as they come up.

Go back and listen

This and all previous episodes are also available on these platforms:

If your preferred podcast app is missing, let us know and we will add it!

Music License:

  • “Bit Quest” by Kevin MacLeod ( and is licensed under Creative Commons: By Attribution 3.0

Sign up for email updates from the CodeVoyagers team

Code Voyagers Podcast: Microservices

Posted on by Richard Davidson

code voyagers podcast

In this episode of the podcast we speak to a member of the Micro Services Shell squad. We chat about the tools they build which allow Skyscanner to build new features quickly and safely at scale.

What were the challenges? What have been the benefits? Listen to find out!

As ever, please let us know what you think and how we could make this better. What type of topics would you like to hear? We’d love to hear your views, either by commenting below or tweeting us @CodeVoyagers  or in the comments below.


Like what you hear? Work with us

We do things differently at Skyscanner and we’re on the lookout for more Engineering Tribe Members across our global offices.

If those vacancies have closed by the time you read this then take a look at our Skyscanner Jobs for more vacancies or sign up to our Code Voyagers mailing list for more case-studies and job roles as they come up.

Go back and listen

This and all previous episodes are also available on these platforms:

If your preferred podcast app is missing, let us know and we will add it!

Music License:

  • “Bit Quest” by Kevin MacLeod ( and is licensed under Creative Commons: By Attribution 3.0

Sign up for email updates from the CodeVoyagers team

Toxic A/B tests: Why fast experiments make you slow

Posted on by Lisa Venter

This post was written for us by Hilary Roberts, one of Skyscanner’s Senior Product Managers.

In 2015, we started running our first controlled A/B tests at Skyscanner, transforming the work of the Product team and the speed and confidence with which we could test new ideas. Since then, we’ve grown our experimentation to hundreds of A/B tests per month.

Each of these tests has been an opportunity for us to iterate through the build-measure-learn loop of Eric Ries’ Lean Startup, learning more about who our end-users are, what problems are most important to them, and whether the software we’ve built is a good solution.

The more experiments we run, the more we’ve started to understand that solving big, valuable problems is about more than just being data-driven, and that sometimes, the best way to eliminate waste from your development cycle is to avoid running any experiment at all.

I originally gave this presentation at the Edinburgh Turing Festival, and then updated it to include even more practical guidance for product practitioners at Canvas Conf.

You can watch the full Science and Sensibility talk below or on their website. You can also follow along with the slide deck.

Want to hear more?

Get our posts straight off the press to your inbox plus news of our latest vacancies across our global offices. Subscribe to our Skyscanner Code Voyagers mailing list.

Want to work with us?

We’re on the look out for a Chief Information Security Officer and Senior Technical Managers. With a range of perks like Home country working, flexi-working, a learning environment and an authentically great company culture Skyscanner is a fantastic company to work for. Come and help us solve the complex problems that live in Travel.

About the author

Hilary cut her teeth in product working with dozens of startups from the University of Edinburgh to test their value propositions and find their first customers.

In 2013 she moved to Skyscanner, one of the world’s largest travel search sites. She is now product manager for the Flights Group, the company’s largest business vertical, with more than 50 million users per month globally.

Sign up for email updates from the CodeVoyagers team

Code Voyagers Podcast: Side Projects

Posted on by Robbie Cole

Yes, the Code Voyagers podcast is back for Season 2! In Episode 1 we chat about the projects that some of us undertake outside our working hours, and how these do (or do not) make us better people in the office. Robbie’s building a computer game using Unity3D, while Milan contributes to a variety of projects including the React javascript framework.

Please let us know what you think and how we could make this better. What type of topics would you like to hear? We’d love to hear your views, either by commenting below or tweeting us @CodeVoyagers  or in the comments below.

This and all previous episodes are also available on these platforms:

If your preferred podcast app is missing, let us know and we will add it!

Music License:

  • “Bit Quest” by Kevin MacLeod ( and is licensed under Creative Commons: By Attribution 3.0
  • “Monkeys Spinning Monkeys” by Kevin MacLeod ( and is licensed under Creative Commons: By Attribution 3.0

Sign up for email updates from the CodeVoyagers team

Something every great engineer should know

Posted on by Iain McDonald

Skyscanner’s core values are helpful in encouraging the right choices to maintain the positive culture we’ve built up over the years. One such value is Master, Teach, Learn. It encourages us to become subject matter experts, seek opportunities to learn new things, and ensure we pass on what we’ve learned to colleagues. An opportunity recently presented itself to share what I knew about declarative programming.

Joel Spolsky

Bryan Dove and Joel Spolsky

Apologies for the potato like quality, but this is a video still from the fireside chat Joel Spolsky (right) had with our CTO, Bryan Dove (left), in our Edinburgh office earlier this summer. We were excited to have one of the earliest engineering bloggers, and someone responsible for Stack Overflow and Trello, share his experience with us in person.

Near the end of the chat, Bryan asked a great question and Joel’s answer surprised me.

Bryan: Are there things that you believe every great engineer should know, that they commonly don’t?

Joel: The idea of a function as a first class object. That was something I was surprised at how few programmers knew.

Joel goes on to describe how this observation was the basis for a blog post back in 2006 called Can Your Programming Language Do This? This surprised me not because I thought declarative programming was a common practice – in all the interviews where I’ve asked a question requiring iteration over a collection, I’ve yet to see a single answer that doesn’t use a for loop. I was surprised because I’d written a blog post along similar lines the previous year.

Skyscanner University

To help us with the Master, Teach, Learn core value, we have an internal training platform called Skyscanner University. This allows us to attend training courses from external presenters, but crucially, it also allows anyone – any employee – to create their own course and teach to whomever wants to attend. In my time here I’ve learned about Docker containers, coaching, leadership… even home brewing – thanks Raymond 😉

I decided to find out by asking in the #engineering channel in Slack, if anyone would be interested in learning more about declarative programming. I was happy to find a dozen or more interested engineers, so I created a short course (with apologies to Mike, one of our resident functional gurus).

Skyscanner staff enjoying functional programming

Higher Order Functions

Ganglia showing all nodes in the EMR cluster with CPU at 100%
Ganglia showing all nodes in the EMR cluster with CPU at 100%

Sharing understanding of higher order functions first occurred when I was a Data Scientist creating Elastic MapReduce clusters in AWS to run large Spark jobs to transform and aggregate loosely structured data – a key constituent of Machine Learning projects. I wouldn’t have known where to begin if it hadn’t been for Jon Skeet and his reimplementation of LINQ, which I worked through several years earlier.

This is one of the key takeaways about declarative programming: I learned to program in a declarative manner by using LINQ in C#, an object oriented language. Most modern, popular languages have a mechanism for first class functions i.e. the ability to store a function reference in a variable as you would an integer or a string. JavaScript needs no special syntax; Python has lambda expressions; C# has Action and Func classes; and, since version 8, even Java has support.

Enabling parallelisation with the Spark framework is just one advantage of knowing how to code in a declarative style. Declarative code is an evaluation of expressions, whereas imperative code is a series of statements that mutate state. The reduction of dependence on state creates code that’s easier to test in an automated manner because less mocking is required; code that’s more stable to ongoing change because it has less dependence on state side effects; code with greater separation of concerns which aids readability and conceptual understanding.

But why is the ability to store a function in a variable so valuable? First class functions enable the use of higher order functions i.e. functions that take functions as parameters, either stored in an intermediate variable or written as an anonymous function, and these function parameters are called during the processing of the higher order function. Or as Xzibit puts it…

Yo dawg I heard you like functions

Thankfully @steveluscher came up with a simpler pictoral representation of what higher order functions, such as map, filter and reduce, are actually doing. Which kicked off a lot of discussion.

Discussion on map/filter/reduce

Knowledge & Understanding versus Problem Solving

It’s not enough to know what a higher order function is, or even what a few specific ones do. The skill in declarative programming comes from practicing solving problems. Mastering declarative programming takes both knowledge and understanding and problem solving. The slide deck I presented in the Skyscanner University course was only half the story. The fun came from solving a small problem using only higher order functions in a stateless manner.

I share the problem here with an example imperative implementation, in Python, JavaScript and C#. Can you solve this in a declarative style using map, filter, reduce and groupBy? You’ll have to do a bit more web research than those attending the course, but I’ll leave you with one piece of advice: if the first word of the function is return, you’re going along the right lines.

Try to resist if you haven’t solved it yourself, but the solutions can be found here. Best of luck! And if you haven’t already, why not Learn You A Haskell?

Declarative Programming Repo Problem – can you solve it?

Sign up for email updates from the CodeVoyagers team

PootleConf 2016

Posted on by Robbie Cole

Pootle is an open source web application that Skyscanner uses to enable our third party translation suppliers to translate our content. Most people within the business have never seen it, nor even heard its name, but all the application strings in the company have been through our instance of it. Pootle is the power behind the throne of our translation process.

This year the developers of Pootle, Translate House, led by Dwayne Bailey, invited us to their developer conference in London, to share our use cases, our needs and wants, and discuss the future of this vital piece of software. Skyscanner sent engineers Sarah Hale, Eamonn Lawlor and myself (Robbie Cole) to represent.

Day 1

It began with introductions and backgrounds. We shared what we were broadly interested in talking about and learning over the conference, then launched into deep-dives about how we actually use Pootle.

First, Eoin Nugent from Yelp discussed their translations systems, then Skyscanner internationalisation lead Sarah Hale presented ours.

The Yelp translation process turned out to be pretty similar to Skyscanner’s process. Strings are developed by teams as English source text and then submitted, where they go into Pootle to get translated, then the translations are pushed back to consuming codebases. They still rely on developers synchronising the latest strings manually, while services in Skyscanner will absorb them almost immediately from our RESTful string service, but on the whole, we’re both doing pretty much the same thing. We’re not crazy, and we’re not alone, and that’s tremendously reassuring!

In life, all things begin with post-it notes.

In the afternoon our thoughts turned to open source contributions, to how we could give our tweaks back to the community. The Translate House developers presented their unit and integration testing suites and showed us how to debug Pootle, along with their strategy for coding style and standards enforcement.

Their approach is once again reassuringly aligned with ours. Much as we will not allow code into our own deployments that does not have unit tests, nor will they; when they expressed that they were worried the demand for testing would put people off contributing, we simply shrugged because comprehensive testing is second nature to us.

Getting the test suite up and running in our Windows environments wasn’t quite so smooth, however. After a few reworded command line calls we got a 100% failure rate in 2539 tests, but luckily Eamonn was able to track down a cross-platform file path bug that was at the heart of them all – and contribute the fix back to master!

Day 2

The second day brought us demonstrations of radical new features and architectural shifts in Pootle. We were first introduced to ‘Pootle File System’, or ‘Pootle FS’, a system that will streamline how Pootle synchronises data in and out of our repositories. This is exactly what we want – built-in functionality that will allow us to remove most of our intermediate bash scripts that connect Pootle to our internal services.

Any integration concerns that Pootle FS wouldn’t cover could then be solved by the next new feature, a plug-in architecture. Rather than hacking onto our own fork of Pootle, thus making up-versioning to get the latest improvements a painful process, developers will be able to build discrete plug-ins that can be installed only in their personal configuration – safely away from danger.

Caution: developers at work!

In the afternoon we descended into hackathonning. I had a stab at writing a custom format parser plug-in – the idea being that, instead of having to convert our resource files to an intermediate format for consumption, Pootle would be able to natively understand them and all the wonderful meta-data they contain.

I didn’t quite get all the way in the time available, but did get Pootle at least absorbing the information from our custom file format. We’ll definitely be picking this up again once we’ve migrated to the latest version and can take advantage of all this new good stuff. The theory, however, is already beautiful; this is a future of which we want to be a part.

We also had a discussion about scaling and Translate House’s current work to put Pootle into Docker images. Nothing is quite ready yet, but this area is under active development, so we’re hoping that once these become available our local development and testing capabilities will be much improved – and we’ll be able to launch Pootle instances straight into the cloud.

Day 3

On the final day we discussed the future of quality checks in Pootle. This is an area of prime interest for us, as we currently have a painful feedback loop – checks are applied after translation is completed, so any failures need to be fed back to the translators for a second pass. What we really want is to use Pootle’s built-in checks that are applied on the spot; when a translator clicks ‘submit’ they will be told immediately about any problems, so they can fix them while their head is still in the zone.

The quality checking framework is up for big changes, as it is currently quite difficult to release and configure checks (they’re housed in the ‘translate toolkit’, a separate project that is still managed by Translate House but isn’t so easy to release). We had a discussion about potential architectures for how checks could be created, configured and managed, trying to find the balance between ease of development, ease of use and future-proofing.

I proposed a super-modular system to separate constraints for running checks from the code behind the checks from their configuration, but my complexity had to be reined in because, yes, Pootle is generally used by actual human beings – often in community situations where people are not as comfortable with technology as we are. There is such a thing as too much flexibility!

With that, we wound down with some blue-sky thinking about what we’d like to see appear next in Pootle. Branded swag was exchanged (with some delightful personalised messages on the swag bags from Translate House!) and proceedings were called to a close to the satisfaction of all.

Super-swish Pootle swag.


All in all, it was a fascinating and productive two-and-a-half days. We got to hear about how Pootle works both in similar commercial environments to our own versus how it is used non-profit organisations to manage crowd-sourced community-driven translation.

We got to discuss the future of Pootle, the new features coming soon, and even shape the roadmap a bit by setting out our own use-cases. Best of all, we got to try some of it out, to experience first-hand the new world of improvements that we’ll soon be able to unlock.

It was great to meet some of the Pootle team and other engineers who use Pootle and learn about their processes. Now we’re even more excited to be upgrading our own instance of Pootle, getting to contribute our bits and pieces back to master – and we’re already lining up for PootleConf 2017!

Sign up for email updates from the CodeVoyagers team

Journey to the centre of Memcached

Posted on by David Oliveira

Journey to the centre of memcached

A few months ago our team noticed that Skippy (Skyscanner’s deep linking engine and one of the components our squad looks after) had started to log an increasing number of NotInCache exceptions. Those exceptions occur when we try to lookup a trip/hotel/car hire in the cache and it can’t be found, causing a new pricing request to be issued. This results in the redirect taking far longer than usual, so not a desirable situation at all.

After a few hours of investigation, we found that some of the data we were sending to Elasticache/memcached was actually being rejected with a "SERVER_ERROR Out of memory storing object" error message.

We’ve always relied on memcached as our caching platform and even though we knew there was a practical certainty we would exceed the cluster capacity, we thought memcached would deal with it, dropping the oldest items with low or no impact at all. But instead we found that we couldn’t store new data there, causing many recent items to not be found in cache and effectivel degrading the customer experience on

Initial investigations bore little fruit and we quickly concluded that our answers were hidden somewhere deep in the inner workings of memcached itself, thus our low-level investigations began.

How does memcached actually work?


Memcached organises its memory in slabs, predetermined lists for objects of the same size range. This reduces the pain of memory fragmentation, since all the objects in a given slab have a similar size.

By default the memory is split in 42 slabs, with the first slab for items up to 96 bytes (on 64 bit architectures), the second one for items from 96 to 120 bytes, etc… and the last one for items from 753.1K to 1MB (maximum item size). The size range of the next slab is always increased by a factor of 1.25 (default) and rounded up to the next multiple of 8.


A page is a memory area of 1MB which contains as many chunks as will fit. Pages are allocated by slabs to store chunks, each one containing a single item. A slab can allocate as many pages as the ones available according to the -m parameter (maximum memory to use for items).


A chunk is the minimum allocated space for a single item, i.e. a value with the string “super small” will be assigned to the first slab, which contains the items up to 96 bytes. However, every single item on that slab will use 96 bytes, even though their current size might be smaller. This mechanism obviously wastes some memory but it also reduces the performance impact of value updates.

A visual representation of how memcached organizes its data.Figure 1 – A visual representation of how memcached organises its data.

How do we run out of memory ?

Once we understood exactly how memcached organises information we could say that we would run out of memory when all the available pages are allocated to slabs.
However, memcached is designed to evict old/unused items in order to store new ones, so how does that work?

For every single slab, memcached keeps a list of the items in the corresponding slab sorted by use (get/set/update) time – the LRU (least recently used) list. So when memory is necessary to store an item in a given slab, it goes straight to the start of the LRU list of the corresponding slab and tries to remove some items to make space for the new one.

In theory it should be enough to remove a single item to make space for another one, however the item that we want to delete might actually be locked and therefore not able to be deleted – so we try the next one, and so on.

In order to keep a limited response time, memcached only tries to remove the first 5 items of the LRU – after that it simply gives up and answers with “SERVER_ERROR Out of memory storing object”.

How items are distributed amongst pagesFigure 2 – regardless of how the items are distributed amongst pages, they are correctly sorted by usage time on the LRU list.

So why would items get locked ?

Every item operation (get, set, update or remove) requires the item in question to be locked. Yes, even get operations require an item lock. That’s because if you’re getting an item from memcached, you want to prevent it from being modified/removed while it’s being sent through the network, otherwise the item’s data would get corrupted. The same applies to updates, increments, etc… The lock ensures data sanity and operation atomicity. Also some internal housekeeping processes might cause items to get locked but let’s not focus on that for now.

Testing it

In order to prove that we could run out of memory just by locking 5 items, we ran the following test:

  1. Launch a memcached instance
  2. Store 5 items of 1-96 bytes length (that’s important because it will map to the first slab)
  3. Request the 5 items we just stored but don’t read() their values (get X\r\n – being X each one of the values we used on the previous step) – that should keep the 5 items locked
  4. Store thousands of 1-96 byte items on memcached in order to fill up its available memory and force it to try to evict data on slab #1

On the last step of the test we started to see the "SERVER_ERROR Out of memory storing object" once the 5 oldest items (the ones on the top of the LRU) were all locked, making it impossible for memcached to release memory for the new items.

Measuring the pressure

To understand better the patterns of data we’re storing on memcached and to better visualise which slabs are getting more pressure to evict data we can use memcached-tool, a script that comes along with memcached source code. It also allows you to get an overview of the distribution of your data amongst slabs and a few other metrics that are quite important. We’ve improved memcached-tool by making it print slab and global memory efficiency. You can check it out here: memcached-tool-ng

Skyscanner cluster statsThe stats for one of the nodes of our cluster

From the screenshot above we can notice that there are definitely 2 groups of slabs under a lot of pressure: slab #2 for items between 97 and 120 bytes and slabs #13, #14 and #15 for items between 1.2kbytes and 2.3kbytes. Also the slabs around #13 and #15 have some considerable pressure. We can see this based on the number of items (~53 million, these 4 slabs) and evictions – if we have to evict items, it means we’re already already running out of memory. Also those slabs use a considerable amount of space (~46GB altogether).

Other important values are the OOM (out of memory), which tells us the number of times we weren’t able to evict data, and the Max_age which is the age of the oldest item in the given slab.

So is it possible to avoid the locks?

Our platform uses memcached quite intensively – we set roughly 2 items on memcached for nearly every single commercial flight combination in the world, multiplied by the number of airlines and agencies selling that flight, plus a few hundred thousand items of other business data. That can go over 500K memcached sets per second per cluster at peak times.

We also run our platform on about a thousand AWS instances making it almost 100k concurrent connections to memcached retrieving and storing data simultaneously.
This combination of circumstances makes it really easy to get into this kind of concurrency problems.

So in order to avoid being unable to store data on memcached due to locked items, we had to tweak a few settings:

  • lru_crawler: Enables a background thread to check for expired items and remove them – a good thing if you want to keep your slabs clean and avoiding evictions;
  • lru_maintainer: Splits the LRU’s in 3 sub-LRU’s (HOT, WARM and COLD); New items are put in the HOT LRU; Items flow from HOT/WARM into COLD; A background thread shuffles items withing the sub-LRU’s as limits are reached – it avoids having always the same items at the top of the LRU;

Other settings you might want to check:

  • -f (chunk size growth factor): Defines the growth factor between slabs (as mentioned on the Slabs topic); It defaults to 1.25; By changing it to a lower value you might end up with more slabs (spreading the pressure of the evictions) but be aware that you can’t have more than 63 slabs – so, for instance, if you change it to 1.10, you will have 63 small slabs but the last one is going to contain all the items between 42k and 1MB (being 1MB the maximum item size), which means every item will take 1MB, causing a really bad memory efficiency;
  • -I: Defines the maximum item size; For instance, if you don’t store items higher than 500kb, you might want to tweak this setting along with the -f and the slab_chunk_max settings so you can spread your data amongst more slabs;
  • expirezero_does_not_evict: Defines whether an item with expire time=0 is evictable or not; If that’s 1 (on), you will get OOM errors as the limits are reached; We didn’t have to tweak this as the default if off;
  • slab_reassign + slab_automove: If you have a long running memcached instance and your usage pattern (key/item size) has changed overtime, you might be incurring on many evictions. It happens because once memcached reaches its memory limits, it won’t be able to allocate more pages to slabs that need them, so the pattern of allocated pages per slab is set forever. These 2 parameters make memcached take pages from slabs without evictions, when a slab is seen having the highest eviction count 3 times, 10 seconds apart.


Memcached is a widely used distributed caching platform with a relatively small learning curve. It barely requires a configuration before you’re able to use it and its API is extremely simple and straight forward.

Despite its simplicity, the way it stores data and manages its memory limits might have some caveats, eventually leading you to some unexpected behaviour. Understanding how your data looks, how memcached organises it and seeing how your cluster is performing will definitely bring you one step further on the control of your whole platform and knowing its limits, avoiding bad surprises in the future.

Sign up for email updates from the CodeVoyagers team

From 20 to 2 million releases a year, part 3

Posted on by Alistair Hann

As Skyscanner scaled from an engineering team of 30 with one website and three services to a team of 100 engineers, release frequency halved. This is the story of the turnaround as the company went on to grow to 400 engineers, with over 100 services, releasing at thousands of times the previous rate. This series of three blog posts will share that story of releasing thousands of times more frequently, and the implications of how our goal of 10,000 releases per day informs our tooling, processes, and how we think about writing software.

Part III – From 1000 to 30,000 releases per year, and beyond

(aka Reversing the death spiral and turning it into a flywheel)

At the end of Part II I explained that we had got to the point of doing hundreds of deployments every month, thanks to moving to continuous delivery. Services were deployed at their own heartbeat and we had moved to a devops model.

The question then became: how fast should we be going? There is an annual publication from Puppet Labs called ‘The State of Devops’ – it is a summary of the learnings from a survey of more than 4,600 technical professionals from organisations of all sizes around the world. They have identified a link between ‘high performing’ software organisations and higher market capitalisation growth in publicly listed companies, and improved profitability and market share in private companies. Compared to other software organisations, the high performing software organisations are reporting:

  • 200x more frequent deployments
  • 24x faster recovery from failures
  • 3x lower change failure rate
  • 2,555x shorter lead times

There are further insights in the 2015 report into how in the high performing group of companies, as the number of developers in the organisation increases, the number of deployments per developer per day exponentially increases Vs flat levels for medium performers and reducing deployments per day for the low performers. A shift we have seen in our own company as we made the journey to continuous delivery – initially adding more engineers reduced our frequency, then we managed to keep it flat, but to make it go exponential we need to work out what is actually happening.

So why does this happen? In Part I of this series, I explained how our release frequency started plummeting as a negative cycle had started – every release made the next release slower. Adding more people to the organisation just compounded those effects. In the case of organisations that are releasing more often, I can see the reverse happening: a flywheel where changes keep reinforcing each other positively. I have drawn this out here:

As smaller changes are made, there is less risk and reverting is easier – hence the faster recovery rate and reduced incidents due to change. The smaller changes are also better for our users, as changes happen more gradually and they get new features earlier. From a process perspective – higher release frequency forces greater automation, and better instrumentation. This leads to fewer errors and greater confidence in the software. This all means happier developers and that means more code gets shipped. The 2016 ‘State of Devops’ report even measured the impact on the team – developers in the high performing organisations were more than twice as likely to recommend their organisation to a friend as a great place to work.

How fast should we go?

Given there is this positive flywheel, how fast should we be looking to make it turn? The number we came up with was 10,000 releases / day. The rationale for that is that we are moving to many micro services. Today it is around 100, in the future, it will be 500-1000 between primitives and orchestration services. In that world, a typical engineer’s code change might touch two to four services (let’s call it an average of three). So if an engineer is making one change per hour, touching three services, that is 24 changes per day per developer, and with a team of 400, that’s 9,600 changes per day.

Now, I don’t actually mind whether Skyscanner hits 10,000 or 5,000 or 20,000 releases per day. The great thing about a target like this is it forces us to think differently, and brings various decisions into focus. For example – we are moving from colo to AWS, and there was some discussion about what our integration environments were going to look like when we move to AWS (they are a total pain to maintain, but some teams are highly dependent on them as part of the Continuous Delivery pattern we original advocated). If there are 10,000 releases going out a day, the idea of a ‘stable’ integration environment to test in, ceases to make sense. So there will be no integration environment in AWS and teams will need to be integrating against the production versions of the APIs they use or mocks.

Another example is that if an engineer is deploying on every change, there cannot be any manual steps between that commit and production, it just isn’t efficient – so all the manual checks in the continuous delivery pipeline shown in Part II have to disappear.

One pioneering team on this journey was working in our Joint Venture with Yahoo! in Japan and they built two services to help them ship more quickly, one called Stevedore that loaded and unloaded Containers from production (for those who haven’t seen Season Two of HBO’s The Wire, Stevedores load and unload ships, and ships have containers…), and Cyan for Blue/Green testing the new deployments (Cyan resulting from the addition of Blue and Green light). The new system dictated a particular workflow:

The new deployment philosophy of Skyscanner

The impact of this system was very positive and it has now been generalised into a system that is used across Skyscanner – we internally call that system ‘Slingshot’. The following diagram shows the original continuous deployment system at the top, and the ‘Slingshot’ continuous delivery system below:

Skyscanner "Slingshot" release process

The first thing to notice is no manual steps going between a commit and the software running across production. Every pull request automatically triggers a Drone CI build that runs tests within a Docker container, after a code review, the tests are run again and that Docker image is the artefact that will be deployed into production. A hook in Gitlab means the successful pull request triggers the Slingshot deployment:

  • The image is retrieved and deployed to a new cluster
  • Automatically, a pre-defined fraction of production traffic goes to the new cluster
  • Rules applied to production metrics determine success or failure
  • There is automatic roll-out to all of production or automatic roll-back

This way of working forces more good behaviour: there is no opportunity for any sneaky manual testing as part of deployment, it cannot happen; teams have to better understand and instrument operational metrics – which improves overall availability. We can only provide this free tooling if people use the default toolsets and patterns – so we get greater convergence and that also makes us more efficient (see You can see those forces on the flywheel are beginning to turn and make us accelerate.

Where have we got to?

It turns out to be surprisingly hard finding out how many releases we are doing – with many systems and pipelines, there isn’t a single place to go and find that number. I recently took a poll of teams and pulled some data from Slingshot and we did 2,400 releases in the last month. While that is only around 120 releases per working day, it is an order of magnitude more than when we first implemented continuous delivery, and three orders of magnitude better than at the peak of our challenges. In order to show this progress on a graph, I’ve had to use a logarithmic scale:


Skyscanner releases over time

The dotted line extrapolates the trend and implies we may get to 10,000 releases per day at the end of next year. While we have gone from 1,400 to 2,500 releases / month over the last six months, there are some impediments to getting to that rate of releasing software. In the teams that have adopted continuous delivery, where every commit is released to production; a team of 15 engineers is releasing 15-20 times per day, so one or two releases per engineer per day. The only way to ship more often is to commit in smaller chunks – so we need to change that part of how we work. Other teams are still working with the ‘legacy’ continuous delivery model and until they are able to migrate to the new standard toolset they cannot benefit from slingshot. In the meantime, the release frequency of these teams is typically daily or even weekly in a few cases.

There is also a wider consideration for the whole business – as we ship changes more frequently, we need to be able to make faster decisions. In a one week ‘sprint’ for a Scrum team, there will be multiple experiments and hundreds of deployments, each generating lots of data on which to base product and business decisions. Thus the product owner and the rest of the organisation have to be ready to make faster decisions to maximise the benefits of faster feedback.

Whether Skyscanner reaches 10,000 releases per day at the end of 2017 or not, I am convinced the trend will continue to be more frequent releases, happier customers, and happier developers.

Sign up for email updates from the CodeVoyagers team

Building products for large scale experimentation

Posted on by Dave Pier

At Skyscanner we have been running hundreds of AB tests to learn how to improve the site for our users. In order to build our experiments faster we have developed an in house system to separate code changes from experiment variants and in so doing provide a massive increase in flexibility. In essence we have turned our whole site into a set of Lego blocks that can be combined in an almost infinite number of combinations that anyone in the company can control from anywhere in the world.

If we step back a few months we would build our AB tests in the standard fashion. We would use our experiment platform,  Dr Jekyll, to assign users to a particular variant of an experiment. Each variant of the experiment would then be directly linked to a section of code. If a given user was in the control group they experienced the standard site, if they were in one of the variants they would receive an altered site and we can track the difference in behaviour. While this works well for areas of investigation that are well bounded it is quite inflexible for new areas where we will have multiple rounds of iteration with each round building on the learning of the previous round.

In order to allow AB experimentation to scale as well as maintain our lean/agile culture we have built in an extra layer of flexibility into Dr Jekyll. We can now tie our code segments to configurables. A config can be thought of as a link between the main body of the code and the parcel of data that it contains. This parcel might be a whole module of code needed in an experiment or it might simply be a boolean value or a string of text. We initially built these configs to allow us to change strings and values throughout the product for different markets and different situations. However, tying code segments to configs and tying multiple configs to a single experiment variant allows for an order of magnitude more flexibility in how we build for experimentation.

multiple small independent code segments to be combined into a single experiment variant

In this diagram we can see that configs allow multiple small independent code segments to be combined into a single experiment variant.

If we now modularise our code such each change we might want to make in an experiment is independant from any other then we create the lego blocks we need to build experiments. Let’s look at an example of where this becomes useful. We wanted to look at redesigning our booking panel from a price centric layout to one that prioritised information and alternative booking options. There were a number of changes that we felt that we needed to make in order to make this change

  1. Collapse the itinerary information
  2. Allow the provider list to show our new star ratings
  3. Move the itinerary information to the top of the panel
  4. Expand the previously closed provider list

In the traditional approach of building AB experiments it is tempting to build the single preferred option and compare only one variant with control. If it improves metrics then great give yourself a pat on the back and ship it. If metrics go down, then what happens? There is no way to know which of the changes had the effect. Do you start stripping back the changes to one controlled change at a time or make more changes until something works? In the new system we can build each one of these changes as a separate config and combine them in a single experiment and control for each of the changes (taking the appropriate statistical considerations for multiple tests). In this particular example we had wanted to check 4 variations but we could have tested 12 given the combinations possible. As it turned out when we saw the final version in the browser we decided to test a variant that we did not intend to build but was possible to create, with no additional development effort due to the available combinations and this was the one that was eventually shipped to production.

Variants from available combinations.

Since implementing the config layer we have found numerous use cases. MVP experiments that are inherently risky can be derisked by starting broad but shallow and then additional functionality built in additional layers of configs as the data from each round of experimentation allows us to refine our ideas. We can also use configs for feature flags by turning their features on but disconnecting them from the underlying experiment. This allows market by market flexibility that can be controlled independently from the core site.

An additional benefit of using a modular config approach has been that this abstracts the complexity of experiment design from the development of features. Developers can now build and test modules independently without needing to worry about which 5 changes need to hang together for a given variant. If we want to extend the experiment in the future then we simply add another config until we have the feature creating the user benefit we had hoped for in the first place.

Similarities with multivariate testing

This approach is similar to multivariate testing, but deliberately limited to specific combinations of code segments/changes. Multivariate testing runs ALL combinations of changes together in order to determine which combination of changes produces the optimal effect. An example would be changing a button placement, string and colour. If there are 3 versions of each placement, string and colour then that 3 x 3 x 3 combinations to test. The system we are describing here allows us to run a multivariate test if we wish BUT it also allows more modular AB testing as described above. The primary purpose is not to throw every possible combination at the wall and see what sticks but rather to reduce the time and cost between learning from one experiment and implementing the next iteration with a directed hypothesis.

More on experimentation at Skyscanner:


Sign up for email updates from the CodeVoyagers team

From 20 to 2 million releases a year, part 2

Posted on by Alistair Hann

As Skyscanner scaled from an engineering team of 30 with one website and three services to a team of 100 engineers, release frequency halved. This is the story of the turnaround as the company went on to grow to 400 engineers, with over 100 services, releasing at thousands of times the previous rate. This series of three blog posts will share that story of releasing thousands of times more frequently, and the implications of how our goal of 10,000 releases per day informs our tooling, processes, and how we think about writing software.

Part II – From 10 to 1000 releases per year – Microservices and Continuous Delivery

In Part I of this series, I explained that our release frequency was plummeting because there was a spiralling negative cycle, where every release was making the next release take longer. Our team was growing but the value we were delivering to our customers was grinding to a halt, and we had to change things.

We made three major changes, which were inextricably linked:

Moving to micro services

At the peak of the crisis, the Skyscanner product (excluding native apps) were two deployable units that were released in lock-step. One was ‘The Website’ and the other was the data services that powered it and the mobile apps. The reason I refer to the data services as one unit is that while there were strictly speaking three services, two were tightly coupled and had to run on the same server. A huge effort went in to split code bases into much smaller services – effectively forking the code, and progressively cutting out the dead wood that wasn’t needed for each of the sub-services. These micro services could be deployed independently. In the first move there were around fifty such services, and now there around one hundred.

Teams Owning Services

We entirely changed the way we organised product engineering. Previously we had ‘development’ and ‘operations’ teams. Instead we adopted a structure partly inspired by Spotify’s 2012 paper describing Squads and Tribes. This is a ‘you write it you run it’ environment with a single team taking on full responsibility for development, deployment and operation of the services it owns. The only remaining ‘ops’ functions moved to teams who built tooling to enable other teams to deliver faster, a very small infrastructure team responsible for networks, CDN, load balancers, virtualization etc., and a very small team providing ‘follow the sun’ front line support (e.g. for data centre outages) and incident management.

We only considered a team to have completed the transition when they could show they had completed an extensive checklist. The list included that the team:

  • Can Independently release services into production
  • Understand and monitor Metrics and KPIs
  • Can run an A/B test in production
  • Set own objectives and key results

Continuous Delivery

Independent deployment of services was driven by a move to continuous delivery. The diagram below shows the ‘best practice’ pattern that all teams were encouraged to adopt.

There was a continuous integration environment with very frequent change, validating the software after ever commit. If that was successful, after a human ‘go ahead’ the code would make it to integration review – an environment with less frequent change, where the service integration would be validated by running a full integration test suite including all service dependencies and a full acceptance test suite. Again, after a manual gate, code could move to pre-production. This was relatively stable and contained probable release candidates where integration and acceptance tests would be run.

A manual gate between pre-production and production allowed for exploratory, performance and load testing in a production-like environment. The production release would then be to a canary which should be a stable mixture of production and the validated release candidates. Smoke testing was combined with monitoring service KPIs and business metrics to ensure the release was good. After a final manual check, the package could be released to all nodes in production where there is ongoing monitoring and alerting of the service KPIs and business metrics.

Continuous delivery in action, with metrics around business KPIs

Making the changes to microservices, teams owning services, and continuous delivery took several months, with different teams completing it at different times – depending on how easy it was to decouple their service, the level of additional automated testing that needed to be added etc. One question is how this change was sold to the business – there were pioneers who hold a lot of credit for driving this change, but there was also a recognition that things couldn’t just carry on like this. The team had doubled in size and we were going slower. Thus the entire executive team and board were behind the change – it also helped to be able to refer to examples like Kevin Scott’s halting deployment at LinkedIn for two months, in order to move to Continuous Deployment.

Many challenges emerged during this switch – moving to a ‘you write it you run it’ devops model meant that teams had to learn about the discipline of deploying and operating production software, when they had previously been insulated from it. There was a steep

learning curve in terms of what were critical metrics to gate deployment and trigger alerting of the individual components, what is an appropriate time to release a change, etc.

Another challenge was that services were subject to a lot more change going on around them. There was a case of a small data change in one service triggering the entire website to go down, because of a cascade of errors triggered by the invalid data and subsequent abnormal responses. The left side of the figure below shows the old model at Skyscanner – Component A is deployed into production as part of a single release with component B – it can be fully tested against the version of Component B that it will be running alongside in production. If component B is unavailable, A will usually also be unavailable as they were often deployed on the same hardware or were otherwise tightly coupled. In this example we didn’t need to think too much about what happens to A if B changed or stops working – any issues would be picked up in the integration environment and regression (hopefully).

After moving to independently deployed micro services, the world was much closer to the right hand side of the figure. We have a situation where Component A has been deployed and isn’t changing, and Component B is being deployed independently. Now, a single release of Component A has to handle new Releases of Component B, including unproven canaries, and Component B being slow, and many other possible types of change.

Types of A/B testing allowed from components

As a result, teams had to start coding a lot more defensively. Hardening how they responded in a world where other services suffered from latency, invalid responses, unavailability, etc. and testing for those scenarios. The end result was more resilient software.

Slowly we learned the operational discipline, improved instrumentation, and increased the resilience of individual services. Overall availability dropped but then recovered. Teams started delivering software again and the very last release train in February 2014 was Hammer Bro, the Super Mario character (we had run out of Muppets and moved on to video game characters).

So where had this all got us – at the end of this change we were releasing 100’s of times per month. The confidence of teams was growing as was confidence in the teams – in Part I, I talked about the annual feature freeze in December (because we were uncomfortable about change while people were away and the ability to handle the new traffic surge in January) we no longer force a change freeze and haven’t done so for the past two years. What we do say is that squads must have support available for two days following any deploys (we are in the process of adapting our support model – something for a later post).

As teams’ confidence grew, some changes were made to the default continuous delivery pattern – some squads would deploy to the integration review and pre-production environments simultaneously, and a minority moved on to automatically rolling out canary releases across production if the metrics were healthy. These were signposts of the direction we would ultimately end up moving in. In Part III I will describe how continuous delivery increased our release frequency by another order of magnitude.

Now continue to Part III of the series, “From 1,000 to 30,000 releases per year, and beyond“.

Sign up for email updates from the CodeVoyagers team