Since we’ve just released our 3-in-1 update for the Skyscanner app, it’s a great time to share some info with you about how we release iOS and Android apps at Skyscanner. In this post we would like to give you a brief overview of our app release process, how we adopted the de facto industry standards and extended them to our needs. Beside this we also would like to give you a couple of ideas where we are planning to improve in the future.
- Separating code drops from feature releases
- Release trains
- Feature flags
- Faster release cycles
- Future ideas and improvements
Decoupling releasing features from releasing the binary
Imagine a company where multiple teams are working on the same app. The teams own different functions, or different screens of the app. They would like to work and deliver new features independently. So if team A has some kind of delay in releasing a new feature, it shouldn’t block team B to release their own feature on time. The company should be able to release a new version of the app even if not all new features are ready for it.
It is also very important to be confident that the new feature is delivering value to our users. Firstly we release only to a small number of users, and depending on their reaction we continue the rollout to all.
To achieve these goals we had to decouple releasing a feature from releasing the app binary. We put the new features behind a feature flag and only turn them on if the feature is ready to be released in production. Meanwhile we release the new binaries in a fixed schedule, so teams don’t need to synchronize their feature delivery, the schedule is available and they can plan ahead easily.
Release train – shipping the new binary on schedule
Releasing the binary follows a fixed schedule, let’s call it a release train. So if you finished your new feature on time, you can release it with the upcoming release train. If you are not ready on time and you missed the train and then you will be able to release the feature with a next train.
On both iOS and Android we have a 2-week release cycle, which mean we ship a new binary bi-weekly. The releases on the two platforms are held on alternate weeks, so each week either a new iOS or Android release train starts.
Feature flags – the tool for actually releasing features
By using feature flags, we are able to specify which features the users should see when they run the application. The feature flags can be modified remotely, so we are able to turn on and off features without releasing a new binary.
Feature flags provide a lot of value at four different stages of development:
- Feature is under development: If the feature is still under development and it is not ready to be released to any users, the flag is only turned on for developers – so they can iterate on the feature and commit changes without causing any user disruption.
- Rolling out a new feature: If the feature is ready to be released, first we usually turn it on for only a small amount of users (for instance 1%), so we can measure the reaction of the users and the quality of the feature (by checking analytics data – key business metrics, errors, crashes). If everything seems to be ok, we increase the rollout, sometimes in multiple steps, for instance 10%, 100%.
- A/B testing: This scenario is similar to point 2 but with variations of what we release. We would like to experiment with a new feature before releasing it to all of our users – so sometimes we can ship multiple variants of the feature and measure which one performs the best.
- Kill switch: If we have a feature which has already gone live and something goes wrong (i.e. the app starts crashing because of it), we can turn it off remotely.
To release the 3-in-1 update we also used feature flags. Early versions of the feature have been presented in the app binary since last December. At its early stages it was only turned on only in our development builds, then in our test builds. After we finished the development we released to 1%, 5%, 20%, 50% and finally to 100% of our users using the feature flags. Meanwhile we continuously monitored the performance of the feature. After the 100% rollout we also kept the feature flag as a kill switch for a while, so if something really bad had happened we could have still rolled it back.
Releasing a new binary frequently
Increasing the frequency of releases is a key part of keeping the pace in delivery. Frequent releases help us to experiment and iterate on our features faster. We can get valuable data about user behaviour faster so we can react on them faster. As of now we have a 2-week release cycle on both android and iOS and we are planning to further increase the frequency.
Each Wednesday morning we have a code freeze on either iOS or Android. After the code freeze we start a ~1 week long stability period. During this period our main goal is to get confident in the stability of our new release, so we are continuously testing the app and working on fixing critical/major bugs.
After we tested and fixed all critical bugs in the new release, we still cannot be confident that the app will work well for all of our users. There can be some issues which can be detected only in production. That’s the reason why on Android at least, we use the staged rollout feature of Google Play. During staged rollout the new release is rolled out to the users gradually (for only 1% first, then 10%, 100%). We are continuously checking key business metrics (like search rate, conversion rate), and also error rate, crash rate, etc. In case the metrics look good we increase the rollout, but in case we detect any major issue, we try to disable the feature using feature flags and ship an update with the next train. If it isn’t possible we release a hotfix to fix the problem. The staged rollout usually takes about one week.
After we validated the new version in production as well (using staged rollout) we release the app globally. But the story doesn’t end here. We keep monitoring the app metrics, user feedback/reviews, crashes and react if necessary.
What’s next? – There is a lot what we can improve
Shipping a new app version to our users in every two weeks is a good thing. It is good because we can quickly iterate and release new fetaures. But is it good enough? Let’s do some maths on how long it takes for one new feature to get to the users.
Let’s say development of this feature takes two weeks. If you sum up the length of the release process , you can see it takes an extra two weeks to get that feature shipped to our users. Beside this – if the feature isn’t finished on time and it misses the release train, that adds an extra 2 weeks again.
So shipping a new feature can take from 4 to 6 weeks to get to 100% rollout. If we need to iterate multiple times on the feature and make several experiments or adjustments, it is even longer. This is way too much compared to a web environment with a continuous delivery flow, where you can release the feature almost immediately. Can we achieve the same for apps? We believe so, in time.
Beside the development areas in our processes, we are facing several constraints coming from the nature of apps. One of these was the review process in App Store, which in the past took at least one full week. Fortunately Apple has worked hard to bring this down – as little as 1-2 days now, so it should not act as a blocker.
Another constraint is that apps are shipped to the users in big packages and app updates are still a big thing – and even more so if we ever had to perform a rollback of something that couldn’t quickly be hidden with feature flags.
However we can see some promising improvements in the industry which might result in making app releases a less heavy thing. Google’s Instant Apps feature is the best example and worth checking out.
So applying full continuous delivery for apps is not something which is possible for apps at this moment in time. But we can still apply best practices from web and use them to increase our release frequency to get that bit closer.
Hopefully the industry will also move in a direction which supports our goals, so if our processes and tooling are sound, we should be able to capitalise on any changes without major lifting.
Tamas Chrenoczy-Nagy is a product engineer working on improving release and development processes, tools for apps.
Posted on by Alistair Hann
A question I am often asked about Skyscanner is what technologies we use. It’s the kind of question that I have asked of other businesses in the past – to validate my own choices, as much as anything else. My, slightly flippant, answer is “All the Technologies”, but we are moving away from that position.
Back in 2010, Skyscanner was largely a .net shop – a bunch of ASP.net, lots of SQL server and one Python service. In the following years, three changes led to a Cambrian explosion in the range of technologies we use:
- We acquired businesses with different tech stacks to us – PHP, Postgres, MySQL, etc.
- We made a strategic choice that we would no longer build services on .net and ultimately move to entirely using Free and Open Source Software (FOSS)
- We switched to a model with a high level of autonomy for teams
The case against diversity
The last point in that list is a tricky one – at Skyscanner we have a model that borrows ideas from Spotify’s Squads and Tribes model. The idea is a collection of autonomous start-up like teams, each with complete ownership of one or more services, able to independently deploy those services, and setting its own roadmap and goals. The model of a collection of autonomous teams is powerful because the teams can execute unencumbered, independently shipping code and delivering value to customers.
A challenge occurs when there is a feature that cannot be shipped without changes to services owned by a different team. Clever shaping of teams and feature teams can help reduce this, but there will always be some feature that requires changes outside of the originating team’s services. One way of handling that situation is that the first team takes a dependency on another team building what they want them to. Unfortunately that breaks the idea autonomous teams delivering value to customers at their own heartbeat. The first team is now delayed by the second, and the second now needs to implement a feature that
may not have been in their roadmap, so they also lose their independence. Another way of handling that is the first team makes a change to the second team’s codebase, they make a series of pull requests and deliver the feature independently. That works well if there is an efficient internal open source model. If teams are all using the same technologies and tooling, that model is a lot more efficient.
When you move to a micro-service architecture with lots of independent services, there is a risk of solving the same problems many times. At Skyscanner we invest heavily in producing tooling to avoid these situations – so engineers can focus on writing new, valuable software rather than solving the same problems that everyone else has solved. Building and maintaining that tooling is difficult when there are dozens of platforms to support. Similarly, our event logging platform team may want to build SDKs to speed up adoption, and ideally they wouldn’t have to write six.
Finally, at Skyscanner we want people to have a variety of challenges. We encourage engineers to rotate between teams and take opportunities to work on different services, and as our products evolve we need to mould our organization to the oncoming work. It is a lot more efficient to move between teams if they are using a familiar tech stack and tooling.
Thus there are many savings to be made if we narrow the number of technologies being used. That doesn’t mean only having one technology stack – there are cases where it is advantageous to have a dynamic language for rapid scripting, or high performance from an interpreted language. For reference, outside of native mobile app development, our default platforms are now Java, Python, Node and React. The reason for Node are the advantages of more rapid development when there is a language consistency between client side and server side.
How do we get there?
In terms of how we get to that position, the stance we have always taken has been not to rewrite systems for the sake of it. There is no customer value in making a change like that. We are setting a direction though – all new services should use the ‘default’ technology set. Then whenever we change things or break services into smaller components, we err on using the default technology set where it means little incremental work.
One way to encourage the shift is through the free tooling teams get for embracing the standard tools. There is a very compelling reason to use what is standard. We are also part way through migrating from co-location to AWS and again we default to using the AWS native services wherever possible, which increases convergence as well as speeding up delivery.
We are not alone in this approach. At Google there are a limited number of languages that are supported for use in production (C++, Go, Java and Python) and something like Ruby is not supported. The practical implementation of that is a list of all the things that need to be available for a language in product (HTTP server, bindings to talk to production infrastructure etc.).
What about that autonomy thing?
The key thing about the model of distributed agile teams is that it is aligned autonomy. The teams are independent to execute, but they share the same purpose and goal – all our teams are working in travel, none are working in selling pet food (for example). That alignment has to happen for technology as well.
Getting the Benefits
We can already see benefits of narrowing our technology set. We are building much richer tooling for our engineers – I was speaking to an engineer earlier today and he was saying how he and two other engineers had created a new micro service from scratch and got it up and running in multi-region AWS serving production traffic in 45 minutes. One enabler of that was ‘Slingshot’ our zero-click-to-production deployment system – every commit is shipped to production, with automated blue/green deployment and rollback. Another was our micro-service shell support for Java that provides the basic event logging, operational monitoring, etc. in order that engineers only need to write new code that is unique to their service. There is a lot more we want to do with the shell, slingshot, and other tools. We can develop that tooling more quickly if we are only doing so for a limited number of platforms.
Getting the complete benefit will take more time – it will be years before we only run on the supported technology stack. That means there will continue to be pain when making changes to some other teams’ codebases that are not in the supported stack, but that pain will be constantly reducing as we converge on a more consistent platform.
Posted on by David Low
From our friends at the Skyscanner Travel Podcast, hosts Sam, James and Hugh talk in fascinating detail with our CEO Gareth Williams about how solving the problem of booking future journeys, created a whole new one.
Everything is on the table – from the original days dreaming of Daft Punk, how personalities and goals evolved over time, and how the important thing is to enjoy that journey…
Posted on by David Low
Many people quote the well-worn phrase, that ‘software is eating the world’.
But have you ever stopped to think about the impact of product engineering, or put more simply ‘code’, and how it affects our everyday lives?
With modern smartphones carrying over 600 times the computing power of a good desktop PC from barely 20 years ago – combined with the fact almost everyone will be carrying one – the power of code and computing to change our world has never been greater.
In this video taken at Tech Talent Week in London, Our SVP Engineering, Bryan Dove talks about the impact of code on society and particularly our own world of travel – and how we as product engineers can really make an impact on how we live.
Posted on by Annette Wilson
At Skyscanner we make use of many of Amazon’s Web Services (AWS). I work in the Travel Content Platform Squad, who are responsible for making sure that re-usable content, like photographs of travel destinations, descriptive snippets of text and whole articles can be used throughout Skyscanner. That might be on the Skyscanner website, in the mobile app, in our newsletter or in whatever cool ideas our colleagues might come up with. Recently we’ve been evaluating Amazon’s DynamoDB to see if it might be appropriate as the primary means of storing data for a new service. If we use DynamoDB as a primary store, and not just a cache for something else, we’ll need to keep backups. But it wasn’t clear how best to take these.
After investigating the options and trying them out I wrote a summary for my colleagues in an internal blog. This is a lightly adapted version of that summary. I’ll warn you now, it’s quite long!
Posted on by Zoltan Kolozsvari
Have you ever seen a seismometer-application running on your smartphone? It’s incredible. The refinement and responsiveness means that it picks up on even the smallest vibrations. So, what if you could use these vibrations in the way we use touch screens: to control your phone?
This was the question posed by a small team in our Budapest office, who launched ‘Project Woodpecker’ to create a knock (vibration) recognition component embedded within an application. Our aim was to see how viable and useful ‘knocking gestures’ could really be.
Posted on by Colin McFarland
If you’re Picasso, don’t A/B test, but for the rest of us it’s humbling to evaluate our ideas…
Posted on by Richard Greene
Atomic design is a pattern pioneered by Brad Frost that advocates building up the visual elements of a webpage from small components that combine to make ever more useful and complex forms. We at Skyscanner have also found it to be a useful analogy for how small components of code functionality can be used to build up functional, integrated web applications. Here’s our experience of using atomic-based code design.
Posted on by Pim Van Oerle
A few weeks ago we launched the Skyscanner Facebook Messenger bot. Designed to give simple answers to complicated travel questions inside Facebook Messenger, the bot forms part of a wider move we’re seeing towards Conversations as a channel, alongside (and overlapping with) web and app.
This post will go over some of our experiences in building the bot – how we set ourselves up, what sort of technology we used to allow us to move quickly, and how we moved from concept to production in just 42 days.
Posted on by Dani Ametller
…and other related stories
So, we got ourselves this fairly new service called ECS (early adopter alert!). ECS is a multi-layer acronym for (Elastic Compute Cloud) Container Service.
The idea is that you simply define your tasks as Docker images, you give them a cluster to run on and the ECS does the rest for you. You can just say how many of your tasks you want to run, and ECS will try to figure out where to place them, when to start and when to kill. Great.
The notion of ‘when to start and when to kill’ might sound alarm bells in your head: if so, here’s our experience of cooperative scaling on AWS ECS.