Not “all the technologies”Posted on by Alistair Hann
A question I am often asked about Skyscanner is what technologies we use. It’s the kind of question that I have asked of other businesses in the past – to validate my own choices, as much as anything else. My, slightly flippant, answer is “All the Technologies”, but we are moving away from that position.
Back in 2010, Skyscanner was largely a .net shop – a bunch of ASP.net, lots of SQL server and one Python service. In the following years, three changes led to a Cambrian explosion in the range of technologies we use:
- We acquired businesses with different tech stacks to us – PHP, Postgres, MySQL, etc.
- We made a strategic choice that we would no longer build services on .net and ultimately move to entirely using Free and Open Source Software (FOSS)
- We switched to a model with a high level of autonomy for teams
The case against diversity
The last point in that list is a tricky one – at Skyscanner we have a model that borrows ideas from Spotify’s Squads and Tribes model. The idea is a collection of autonomous start-up like teams, each with complete ownership of one or more services, able to independently deploy those services, and setting its own roadmap and goals. The model of a collection of autonomous teams is powerful because the teams can execute unencumbered, independently shipping code and delivering value to customers.
A challenge occurs when there is a feature that cannot be shipped without changes to services owned by a different team. Clever shaping of teams and feature teams can help reduce this, but there will always be some feature that requires changes outside of the originating team’s services. One way of handling that situation is that the first team takes a dependency on another team building what they want them to. Unfortunately that breaks the idea autonomous teams delivering value to customers at their own heartbeat. The first team is now delayed by the second, and the second now needs to implement a feature that
may not have been in their roadmap, so they also lose their independence. Another way of handling that is the first team makes a change to the second team’s codebase, they make a series of pull requests and deliver the feature independently. That works well if there is an efficient internal open source model. If teams are all using the same technologies and tooling, that model is a lot more efficient.
When you move to a micro-service architecture with lots of independent services, there is a risk of solving the same problems many times. At Skyscanner we invest heavily in producing tooling to avoid these situations – so engineers can focus on writing new, valuable software rather than solving the same problems that everyone else has solved. Building and maintaining that tooling is difficult when there are dozens of platforms to support. Similarly, our event logging platform team may want to build SDKs to speed up adoption, and ideally they wouldn’t have to write six.
Finally, at Skyscanner we want people to have a variety of challenges. We encourage engineers to rotate between teams and take opportunities to work on different services, and as our products evolve we need to mould our organization to the oncoming work. It is a lot more efficient to move between teams if they are using a familiar tech stack and tooling.
Thus there are many savings to be made if we narrow the number of technologies being used. That doesn’t mean only having one technology stack – there are cases where it is advantageous to have a dynamic language for rapid scripting, or high performance from an interpreted language. For reference, outside of native mobile app development, our default platforms are now Java, Python, Node and React. The reason for Node are the advantages of more rapid development when there is a language consistency between client side and server side.
How do we get there?
In terms of how we get to that position, the stance we have always taken has been not to rewrite systems for the sake of it. There is no customer value in making a change like that. We are setting a direction though – all new services should use the ‘default’ technology set. Then whenever we change things or break services into smaller components, we err on using the default technology set where it means little incremental work.
One way to encourage the shift is through the free tooling teams get for embracing the standard tools. There is a very compelling reason to use what is standard. We are also part way through migrating from co-location to AWS and again we default to using the AWS native services wherever possible, which increases convergence as well as speeding up delivery.
We are not alone in this approach. At Google there are a limited number of languages that are supported for use in production (C++, Go, Java and Python) and something like Ruby is not supported. The practical implementation of that is a list of all the things that need to be available for a language in product (HTTP server, bindings to talk to production infrastructure etc.).
What about that autonomy thing?
The key thing about the model of distributed agile teams is that it is aligned autonomy. The teams are independent to execute, but they share the same purpose and goal – all our teams are working in travel, none are working in selling pet food (for example). That alignment has to happen for technology as well.
Getting the Benefits
We can already see benefits of narrowing our technology set. We are building much richer tooling for our engineers – I was speaking to an engineer earlier today and he was saying how he and two other engineers had created a new micro service from scratch and got it up and running in multi-region AWS serving production traffic in 45 minutes. One enabler of that was ‘Slingshot’ our zero-click-to-production deployment system – every commit is shipped to production, with automated blue/green deployment and rollback. Another was our micro-service shell support for Java that provides the basic event logging, operational monitoring, etc. in order that engineers only need to write new code that is unique to their service. There is a lot more we want to do with the shell, slingshot, and other tools. We can develop that tooling more quickly if we are only doing so for a limited number of platforms.
Getting the complete benefit will take more time – it will be years before we only run on the supported technology stack. That means there will continue to be pain when making changes to some other teams’ codebases that are not in the supported stack, but that pain will be constantly reducing as we converge on a more consistent platform.