Logging at Skyscanner: building and using a real self-serve data platform at scale

Posted on by Scott Krueger

I recently had the privilege of presenting a talk to an International Big Data Audience in beautiful Budapest, Hungary. My crunchconf talk, titled ‘Logging@Skyscanner – a dreamer’s guide to building and using a real self-service data platform at scale’, explored the motivations, research process, cultural impact and implementation details that got us into operating an infinitely scalable data platform.

It’s important we all share our experiences of implementing distributed data platforms.

What follows here is a summary of the steps we took at Skyscanner to do this, and the outcomes that followed.


How does one know to change direction?

For us this was pretty easy.  We started with a single monolithic RDBMS that performed all kinds of tasks. The logging database was a denormalized 4-table DB that handled all of reads and writes. By tuning the hell out of it we managed to get a lot of mileage from it. However, the warning signs were getting pretty obvious by the end.

The signs

  • Writing OR reading doesn’t perform
  • Writing AND reading don’t agree with each other, a.k.a. contention (which in relational models can happen at various levels internally – make sure you have someone who understands the internals to hand)
  • Writing OR reading can’t be done in a timely fashion anymore, probably due to contention

Trap 1:

Solution: Just scale out the write nodes, and batch back periodically to a read node.

Problem: If your business is growing, you have more event volume you are trying to stuff into a table in the same time frame. You are prolonging the inevitable.  You’re back at the beginning.

Trap 2:

Solution: Denormalise everything. This is really effective and really does get you away from most of the problems associated with the above. BUT, it has some undesirable side effects that I hope you’ll avoid by reading this.

Problems: You need to do some serious data engineering to make that data readable. ETL pipelines suddenly go from two to 20 steps. And with more steps comes more confusion, and less ability to read the data in the first place. WTF is going on in ETL step 17? I dont know, and no one else seems to either. Plus, if your business is growing, you have more event volume, so you are prolonging the inevitable. You’ve still not solved the problem, and now it is even harder to diagnose.

No more traps

We needed a logging platform that would:

  • Have a much longer best-before date (i.e. horizontal scale)
  • Give the data back to the people

So where did we start?

Everyone says it’s the journey that’s important, not the destination. ‘They’ might be right but in this case the journey AND the destination are equally as important. Create a vision, do your research and make a plan.

Destination = Scale-Out-and-Give-the-data-back-to-the-people land

Journey = This process, captured forever here

It’s confusing out there in the real world and here’s why.

Check out these ‘Big Data’ headlines:

The 11 Trillion Internet Of Things, Big Data and Pattern Of Life (POL) Analytics”{POL? Tell-your-mom speak: Machines are watching and predicting your next movements}

Amazon Web Services gets serious about big data analytics with bevy of new services. The tech titan’s cloud arm, Amazon Web Services, is beefing up its suite of streaming data and analytics services.”

Enterprise-software’s Trillion Dollar Opportunity


What now?

Identify your business need.

Definitions

Business Need: The sensible pairing of technical want with a legitimate business reason to do this.

Technical want: Engineers love solving problems and we also like to over-do it. It’s part of our fabric. So what we think we need might not match the business reason.

Business reason: The destination.

Research

What are others doing? Build or Buy? How does it impact my existing systems and processes?

Take the time to read-up and see what your peers and idols are up to.

My inspiration came from Jay Krep’s deservedly wordy “What every software engineer should know about real time data’s unifying abstraction“.

“One pipeline to rule them all”. That’s it – I’m sold. Point all of your data producers to one place and send your data consumers to that same place. Sounds simple right? It is. And that’s why it’s so powerful. Value will appear in places you don’t expect it to.

At the time of our research, we needed to plan for data centre and cloud capabilities. For this, and many other compelling reasons as stated in Jay Krep’s article, we chose Apache Kafka which was (and still is) one of the best data technology decisions we have ever made.

I can’t speak highly enough about Kafka. I’ll resist and defer more praise to a future article.

With Kafka comes Zookeeper – your trusty friend to manage the distribution of a Kafka Cluster. Provided you dedicate your Zookeeper to Kafka and follow the operational guidance, it just sits there and does what it’s good at.


What would a data platform be like without doing something with the data?

Enter Stream Processing, Data Transport, Analysis, Visualization and Slower Data.

Connectivity and inter-operability is getting really good these days. You have a lot of choice here. Choose something you know will work with your stack with as little effort as possible. If you were starting from scratch, and were entirely cloud hosting, you’d do no wrong by looking at some cloud provider offerings in this territory who are waking up pretty quickly.

Stream Processing: we chose Apache Samza because it connects very well with Kafka and has some really brilliant capabilities that are easily-recognizable to any functional programmer and/or SQL developer. Why not Spark-Streaming? At the time it was getting kind of hot, but has since fizzled out. There are some other good options out there too – and it should come as no surprise that a few major cloud providers are also hot on the heels.

Data Transportation: Logstash. Config driven Data in -> Data out. e.g. application log -> Kafka Topic; Kafka Topic -> Your favourite DB for analysis.

Analysis: As keen elastic users (formerly elasticsearch for you oldies), this was good enough for the purpose of doing some analysis and seeing the data. We hit it hard from the get-go.

Visualization: Elastic Logstash Kibana (“ELK”) – don’t leave out the “K”, you’re 2/3’s there. Fire up :9334 and click-away. One-hour learning the basic tricks and you’re good.

Slower Data: Archive all of your data to cloud storage (like S3). No question about it. The “what will we do with it” debates you had 10 years ago are gone. Wake up – storage is cheap and if you don’t have an army of talented data people to tell you what to do with it, go and find them. And when you have them, listen carefully to what they say. Your business’ future relies on great data that is turned into great information by these aforementioned great people.

We make use of Spark, EMR, Tableau, Pandas, SciPy to name a few. Keep connectivity and inter-operability in mind for your platform outputs (which isn’t too difficult now due to increased openness from traditionally closed-source vendors).

platforms


Hosting Considerations

Are you in Data Centres? Are you in Cloud? Are you in both? Do your applications need data locality? Can you afford(!) some latency? Compression? Encryption? Geography? There’s a lot to think about here – so take the time to think it through.

What about my existing logging platform?

Follow these two weird tricks to keep your business running:

  1. You just have to do this in parallel. We looked for various solutions to expedite a migration but came right back to the simplest option – do it in parallel so your data consumers and business have confidence in the quality of what you are doing.
  2. Since you’re going the parallel route, figure out the easiest way to piggy back on top of your existing logging infrastructure. We listened in on a rsyslog stream of our load balancer web logs.

Schemas

It’s really really important to (re)define schemas for your events.

It’s therapeutic. Involve lots of people – particularly those who understand the data. These will be the event producers and the analysts who work with this data everyday. They know what things mean. They know what is missing, or isn’t in use anymore. Prepare yourself to go through the rounds many many times as producers and consumers continue to use the schemas.

It’s really important that you document the fields correctly, and maintain a good history of your changes and the reasons why (no squashing!).

Schemas are your first port of call for data quality. They help preserve the structure and integrity of the data.  It’s really important you have additional quality guidelines in place though. Any old value can still ‘fit’ in a schema. Bad data means bad decisions.

Design a schema work flow. We use a standard GIT flow with a central schema repository. The same group of domain experts should review merge requests before new versions are available. All versions must stay backwards compatible (a consumer must always be able to read a message, even if the structure of that data changes). Schemas are for life.

Schemas and Serialization

Choose a schema tech that’s right for your stack – particularly your consumers (since you’ll have way more of them than you will producers when all goes to plan). The trade-offs generally boil down to message compaction vs serialization speeds. There are a number of benchmarks out there.

We use protocol buffers since it meets all of our needs and is actively developed.

Research Complete?

You’ve done all your homework and you’re ready to go right? Right! Stick to your decisions and go with it. Just remember these 21st century rules:

  1. By the time you begin development, something new will have come along.
  2. By the time you are fully operational in prod, something new will have come along.

Things are moving quickly and a lot of great innovation + simplification is heading our way. What you have today will become easier to use AND more cost effective tomorrow.


Pitch It

This is the point where you’re going to have to convince whomever writes the cheques that it’s a good idea to go with the new plan.

Keep things simple and high level – they’re not interested why write order doesn’t matter and how this improves throughput and latency!

Here’s a diagram we used showing the simple approach being proposed:

diagram

Explain the value of doing this and how it meets the business needs (by identifying what the drivers are and how the technical solution meets this).

This diagram was accompanied with some easy to understand dialogue. It went something like this:

“All of our services capture events, like click activity, or application logs. We can call these services Data Producers, who ‘log’ their data to a single place that can handle A LOT of writes and reads. This central log is highly available and can easily scale out when needed. Consumers pick up these same data events to create new data sets. These data sets can then be turned into any data product we need: reports, archives, application data, data for exploratory analysis, data for machine learning, and so on.”​

Development: Proof of Concept (POC) vs Minimal Viable Product (MVP)

At this point you’re ready to develop something and put all of your good research to test. Right from the start we chose a high-volume event stream that would provide a lot of value for us. It wasn’t an experiment or a quick prototype to see if the platform would work for us. The technologies had already been proven a number of times over.

We were committed and treated it as a product that we would continue to iterate on and improve upon.

We did push to production-like environments quickly so we could start load testing and estimating cluster sizes. Which also gave us the advantage of seeing production value early on.

Demonstrating Value

As soon as you can, start playing and showing others what kind of cool stuff you can do. Here’s a kibana dashboard I created in real-time to look at some flight pricing trends:

pricingtrends

Trap:

A-ha! You are building a brand new data visualization tool!

Ummm, not quite. We’re building a unified logging platform that anyone can use to do great things with data. It took us a couple of months to realize perceptions were different!

Tip:

When you are demonstrating something, make sure you clearly explain that this is something consumers themselves can do as a result of the platform you are building. It is not ‘the platform’ you are building.


Taking it to Prod

Hopefully in this day and age you too are doing everything with the best development and operational practices in place. Continuous integration and delivery is going on; there are lots of unit / functional / regression tests running; code is well instrumented; monitoring and alerting is on;  incremental builds and quick deploys push to prod smoothly; and so on…

All of this adds up and takes time – plan for it and try your hardest not to cut any corners. If you do, come back ASAP and pay off that tech debt. It quickly piles up in distributed data platforms that by nature have more moving parts then your standard application.

technicaldebt

If you are also deploying to a cloud hosting provider, there are more things you need to do.

MVP Quality

How do you know things are as they should be? Provided you trust your old logging platform outputs, you can track the same metrics side by side because you’re taking a parallel migration approach.

We could confidently show our existing data consumers the new platform is up to scratch – stream processed numbers are looking the same.

On-boarding

You’re getting value out of the MVP so now is the time to encourage your producers and consumers to make the move.

Here are a few tips to ease the on-boarding process:

  • Not all your events have to migrate – figure out what is actually being used. Somebody knows so find them and ask them
  • Documentation: how do I write? How do I read? We use readthedocs with GIT.
  • Ask for feedback, good and bad; use to improve your documentation (the aim being self-service)
  • Your teams that are already logging mightn’t have the same passions and incentives as you have. Remind and educate continuously
  • Promote adopting the platform where work is prioritized
  • Hold a two day distraction-free workshop to focus on schema authoring, writing and reading. Stuff gets done very quickly when everyone focuses on just three things and the right people are at hand (we had a group of four representing domain expert, producer, platform and consumer)

Results

Did we meet our business need? Yes, and more value appears as more and more users adopt the platform. A snowball effect starts and great things start to appear.

indicative

Indicative prices, as a result of stream processing

kibana

New events made available for all via a unified logging platform

What’s next?

We continue to evolve our platform and actively keep up-to-date on emerging technologies and techniques.

As we are big supporters of open-source code and process, you might find our kafka ‘topic enforcer’ tool useful. It comes in handy when you’re managing a lot of topics across several clusters.

We strongly believe stream processing on top of a robust unified logging platform is the best way of handling an organization’s most important asset – its data.

 

 


Sign up for email updates from the CodeVoyagers team


What worked yesterday, is painful today, is broken tomorrow

Posted on by Richard Lennox

Over recent years we have seen British Cycling dominate the sport. They have maxed out the medal table at both the Beijing and London Olympics, as well as having two cyclists win three out of the last four Tour de France. Ten years ago this wouldn’t have happened. How? It has been achieved primarily through the adoption of the mind-set instilled by David Brailsford; one of improving absolutely everything by as little as 1%.

Looking closer to home, it is our jobs as engineers or product managers, as squads and teams, to focus on ‘winning’, and winning means continually getting better at what we do.

As Skyscanner continues to grow, the shapes and complexity of the engineering and technical challenges change and how we approach them must also adapt at the same rate. Ultimately we find that at scale, continuing to do what we did yesterday is not sufficient. The alternative is getting stuck in a broken cycle that ultimately means we cannot achieve our goals. We can be left with a feeling of wading through quicksand, slowing so that eventually we would sink. To continue growing at scale, it is necessary for us to apply a steady stream of Continuous Improvement to everything we do.

However, in a world where we are striving to maximise the delivery of value to our users and partners, stopping to improve everything is not an option – our time is equally precious. How do we make sure we are working on the right things? Skyscanner has successfully utilised the Theory of Constraints (as one of many tools), to direct our continuous improvement actions to ensure that they have the maximum possible impact.

The Theory of Constraints

Introduced in the book The Goal ,Eliyahu M. Goldratt created the ‘Theory of Constraints’ (TOC) as a way to apply Lean principles while focussing on the most impactful improvements possible.

It is a methodology for identifying the most important limiting factor (i.e. constraint) that stands in the way of achieving a goal and then systematically improving that constraint until it is no longer the limiting factor. In manufacturing, the constraint is often referred to as a ‘bottleneck’.

toc

The path of software (from Idea to IDE to Production experiment to full roll-out and maximised value) flows through a pipeline, a set of (mostly) automated steps. While we at Skyscanner favour people over processes, there is no hiding from the fact there are processes involved in everything we do. As with any manufacturing flow, these processes are subject to the Law of the Minimum, and there is always a limiting factor. After all, we do not have infinite capacity!

TOC helps us focus on reducing and removing the bottlenecks within our processes systematically. Allowing us to focus on the current limiting factor of our process, and by working on removing or reducing the bottleneck, we can be sure we are directing our efforts to make the biggest gains.

Have you ever completed a retrospective action only to realise that you have made little or no impact to the next sprint, and been left scratching your head? Perhaps it is because you optimised something that was not the root cause or constraint on the process. TOC helps us focus on getting our actions right.

Focussing on the bottlenecktoc1

This is how one can apply the most basic tenants of TOC to processes.

An action to improve capacity upstream of the bottleneck increases the pressure and produces more unfinished work (waste).

toc2

An action to improve capacity downstream of the bottleneck does nothing to reduce the pressure on the bottleneck and is wasted improvement effort.

toc3

Only by maximising the throughput through the bottleneck (exploiting it), and putting everything on the cadence of the bottleneck (subordinating to it)…

toc4

…Then increasing capacity at the bottleneck (elevating it) do we get significant, impactful improvements.

toc5

Often the ‘exploit’ and ‘subordinate’ actions alone can have significant impact. For every improvement we do we will always have a bottleneck, so we need to continually restart the process in identifying bottlenecks.

Application and results

While this simplified introduction to the Five Focussing Steps of TOC highlights how it can be used tomaximise outcomes, it is through clear results that TOC has become a mainstay in Skyscanner’s toolbox. Here are some examples of the successes we have had:

  • In Our Hotels Back-end Squad we have reduced bug throughput to < 1 week
  • Improved delivery predictability through guided process changes with zero effort
  • In our translations processes we have reduced turnaround time from upwards of 2 weeks to < 2 days and are now looking at < 24 hours
  • Tripled the velocity in areas of our B2B tribe

Overall, TOC has aided our ability to deliver on continuous improvement and adapting to the pain we feel when, due to our continued growth, our process are struggling. Some squads now operate with every other retrospective being tightly focussed on working their constraints to maximise throughput, and they’re really seeing the benefits.


Sign up for email updates from the CodeVoyagers team


I’ve seen the future of mobile

Posted on by Balint Orosz

Blog_China_Final

“We’ve just built a website but I haven’t really checked how it performs yet.”

This sentence came from one of the founders of Carousell – a Singaporean start-up I visited, who not too long ago raised $6 million from Sequoia Capital. Carousell is one of the most used applications in the region for shopping, and the statement above clearly indicated how web in most of APAC markets is just an ‘afterthought’. Why? The primary interface is mainly apps.

Talking to these guys and other start-ups during a trip to Asia proved to be an incredibly valuable experience. While we mostly know what Samsung, LG, and other big companies are doing through large tech portals, we see much less about the new, hyper growth startups which very effectively build on local habits, legal environment and scale at an exponential pace.

It’s clear to see that these companies expect exponential growth, user loyalty and ‘easy, effective payments’ through mobile; more specifically, apps. While from market research​ we would still assume mobile is too immature, in lots of Asian markets that’s no longer true.

(Local) Apps are everywhere

People use apps everytime, everywhere. When I arrived at the airport in Korea the first thing I noticed was the weird yellow screen on everyone’s phone – the KakaoTalk app. Over 90% of Korea’s population uses KakaoTalk, and from what I saw they use it as their primary way of communication. I didn’t see people using iMessage, or even SMS – it’s all KakaoTalk. Interestingly when I asked people why they don’t use Facebook Messenger, they responded with a question –  “but can Facebook Messenger do group messages?”.

One local startup was founded 1.5 years ago, and uses a model very similar to Hotels Tonight. It has 10 million downloads in Korea. That’s roughly 20% of Korea’s population. Insane.

On billboards you see ads not for products or companies but for apps – it might be for food ordering, yet another Uber clone or whatever that might be. And in most of the cases it’s likely those outside the country have never before heard about these apps. They’re mainly local apps, developed by local companies, serving local needs, in a local fashion. That might be supporting the most popular local online/mobile payment methods, or having design / UX that appeals to local users. People in most of the cases can distinguish local vs. global apps – and often have a preference for local ones.

That being said many of these companies struggle when not even going global, but outside of their home country at any level. They build rapidly with focus on execution and local needs – in lots of cases ignoring planning for scale – and only when successful on a local level, do they start to think serioulsy about expansion. At this point they face several difficulties, starting from software / architectural difficultes (technical debt), and cultural differences.

The first real personal ‘Travel Assistant’ will come from APAC

For users in APAC, mobile is what the *Personal* Computer, PCs as we know, always wanted to be, but never could achieve the status. As users can use their mobile devices for nearly everything – from communication, to shopping, food ordering, transportation etc, it becomes an extremely personal item for them. In this region ‘Mobile is Social’ and ‘Social is Mobile’. Considering that ‘Travel is Social’, it’s quite simple to see how ‘Travel is also Mobile’.

Together with the trend of deeply coupled services, instead of the de-coupling of services we can see in most Western markets, it’s clear that people expect integrated solutions which solve all of their complex needs and not just simple function stand-alone apps.

They expect to do everything in one app – browse, gain inspiration, book, and guide them through the travel – all of this in a very smart way.

The future is here

APAC is not a country, but a region. While  there are some characteristics which are very similar across most of the countries, at the same time, many are very different, meaning there is no ‘one solution fits all’ for this wide-ranging and diverse market.

Saying that, there is plenty of opportunity in APAC.  Local product development methods are not more advanced, neither is their engineering. However, the pace of execution is extremely, admirably fast. The key to their success is a focus on local needs and embedding features in this way deep in the product. Our bus-booking feature in India is an example of the way Skyscanner is looking to approach the region: tailored, localised in approach, combined with the strength of our global comprehensive flight coverage.

One of our challenges in APAC is the way in which to modify  technical and product architecture in a way which allows ‘deep localization’ – i.e. altering some features completely for given markets , and of course having product/engineering teams focusing exclusively on these markets.

What is really clear is that mobile and apps are far more ahead in APAC in terms of maturity than I had previously imagined. It shows us the future, and in that future mobile gets higher user numbers, higher frequency and generates more revenue than desktop. This future will arrive to Western markets as well, and anything we do in APAC can only help us better understand and prepare for the future in other regions too.

 


Sign up for email updates from the CodeVoyagers team


Common Pitfalls in Experimentation

Posted on by Colin McFarland

Experimentation

Through experiments we expose our ideas to empirical evaluation. Naturally, uncertainty follows, but the organisational mind-set one develops around this uncertainty is crucial.

More often than not, the things we believe will improve metrics simply don’t. When exposed to real users in controlled experiments, those no-brainer ideas we assume to be obvious often fail to improve the metrics designed for them. Far from the case studies from agencies selling their services, research from Internet Economy organisations we trust reveal the failure statistics from experimentation to be humbling.

In his book Uncontrolled, Jim Manzi revealed that at Google only  about  10%  of experiments lead  to  business  changes. It’s an arrestingly small number, but it’s  a  similar  story  at Microsoft.

In Online Experimentation at Microsoft (PDF), Ron Kohavi confides that of their experiments, only about one third were successful in improving the key metric.

In Do  It  Wrong  Quickly  we learn how ‘Netflix  considers 90%  of  what  they  try to  be  wrong’ Dan McKinley at Etsy contributed too:

“It’s been humbling to realize how rare it is for [features] to succeed on the first attempt”. 

From thousands of experiments during my time leading Experimentation at Shop Direct, intuition was wrong more often than not. At Skyscanner, early evidence suggests towards 80% of experiments fail to improve the predicted metrics.

It’s easy to see how this idea could be industry-wide and poorly recognized. If any Internet Economy business is to seriously compete, it needs to begin by abandoning assumptions and moving from a culture of deployment to a culture of learning. As we start out, we believe these features and designs to be valuable – we’re investing time and effort building them – so what’s going wrong? Why do some many fail and what can we do about some of it?

Apart from statistics around failure, little has been said about what we can do to tackle it. We can do better. If we understand where experiments go wrong, we can work to improve things. Understanding common pitfalls can help us determine why some experiments failed.

Presented here are common pitfalls in experimentation, loosely in order of impact.

 

Pitfall 1: Not Sharing or Prioritising Evidence

Another common problem I see at large scale across businesses is the learning from experiments isn’t shared widely beyond the team operating the experiment. Clearly, winning experiments should be promoted across the organisation. When we fail more often than not, it’s easier to consider how success can improve if we understand winning experiments in one domain so that we can repeat them in another.

When experiments fail it isn’t as straightforward. Hindsight can lead us to realise some experiments were executed poorly. The data from that experiment could have limited or no value to others. Perhaps it’s invalid. Far more important is sharing surprising failures — those ideas that many people would think will be successful. If your users have rejected a feature or idea and it surprised you, it will surprise others too. Sharing these surprising results is a wonderful opportunity.

You can save others repeating the same experiment; they may adapt their method based on your data, or validate your experiment and offer new insights along the way. You’ll evolve your understanding of cause and effect as an organisation this way.

Prioritisation is hard, and sometimes you won’t even spot your misses, because our own ideas often seem a better choice than competing ideas simply because they are our ideas. We shouldn’t be surprised: Behavioural economists have demonstrated the IKEA effect [PDF]; when we construct products ourselves we overvalue them.

You’ll likely have many data sources available to you, and should use them: data is data. Evidence from other experiments, qualitative, and quantitative feedback, should challenge priority at every opportunity. Experiments fail more often than not; taking evidence from others’ winning experiments gives us an opportunity to validate it at a wider scale. Evidence from experiments should challenge our execution of experiment details too.

 

Pitfall 2: Poor Hypothesis and No Overall Evaluation Criterion 

Hypotheses are not beliefs. They are predictions. Often we can consider deploying an experiment and then trying to find the data to prove ourselves right. This is a problem. Look for anything and you’ll find something, but this bias in analysis can lead you to miss a wider detrimental impact and many false positives.​

A good place to start is by defining an Overall Evaluation Criterion (OEC), a small set of metrics agreed across your organisation. Over time the OEC can evolve. When you experiment against an OEC, most feature ideas are now simply hypotheses to improve it and we can move faster by proving ourselves wrong quickly.

Pitfall 3: Poor Understanding of Significance

Significance is widely misunderstood, even a book by the co-founder and CEO of the leading A/B testing platforms gets this wrong: ‘Once the test reaches statistical significance, you’ll have your answer’ and ‘When the test has reached a statistically significant conclusion, the tester sends a follow-up with results and the key takeaways’ are incorrect procedures that will lead to finding many more false positives than expected. The correct procedure is to determine the duration upfront, conduct power calculations to understand the traffic you need, and calculate the significance of your data when the test has run for the full duration.

Another misconception I see often is the expectation that significance is somehow neatly wrapped with the experiment to proof it. Significance is concerning itself only with the differences in numbers. We are null hypothesis significance testing (NHST) – the null is the inverse of our prediction. With NHST the starting position for most experiments is that nothing interesting is happening, and it offers a form of evidence – expressed as p-Value – that something interesting might be happening. The lower the p-Value, the more you can trust the ‘interestingness’ of your data and draw conclusions accordingly.

 

Pitfall 4: Running Inconclusive Experiments Longer

By its nature, running experiments causes waste. Many designs or features won’t survive; a lot of your work, no matter how much you want it to, just won’t land with your users. That can be hard to take but it is part of our process. We are all passionate about our products but we need to careful not to fall in love with our own ideas and focus on validating ourselves.

This can lead to running inconclusive experiments longer in the hope users will get used to things. While is it possible novelty effects could cause deltas to change significantly, this is rare in our setting. Avoid running your experiment for longer than the power you need simply because you aren’t getting the data you want.

There’s another problem with this approach. Let’s say you’re aiming to measure 1% uplift and you already have the power to demonstrate that but your experiment hasn’t, then looking for significance further only increases your chances of false positives.

Ryan Singer said it best: “You can’t improve […] when you’re emotionally attached to previous decisions. Improvements come from flexibility and openness.” We’ll modify and repeat to learn further, but we should be deferential enough to change tact if users tell us through their clicks that our ideas are off track.

Pitfall 5: Ineffective Ramp Up Strategy 

In an uncontrolled setting a feature would ship to 100% of users and take days for user feedback/data to show something bad has happened. When you run an experiment you have the ability to detect and abort these bad changes quickly. Failure is inevitable in experiments, but expensive failure is not if we can design effective ramp up procedures.

Typically, the first stage will be within a 5% range of users to minimise the blast radius and risk of realising a change. The second stage will be within a 50% range of users to measure the change, and a third stage (if your experiment is successful), will be to ramp the feature to all users (or within a 95% if a hold out group to measure impact over time).

It’s crucial your ramp up strategy doesn’t slow you down unnecessarily. An effective ramp up strategy should have clear goals against each stage, and move forward to the next stage unless results are unsatisfactory. For example: stage one may look to validate there is no significant detrimental impact while ensuring engineering and platform resources are performing reliably in less than 24 hours.

 

Pitfall 6: Method That Doesn’t Match the Expected Uplift

On one hand we can make so many changes at once that our experiment levels out, on the other we could make changes so incremental we can’t realistically measure it in a meaningful timeframe. Consider this as a hill climbing metaphor. We’re using experiments to climb, but we’re doing it blindfolded. We can’t see upfront if we’re at the top of a hill (local maximum) with a much bigger mountain further ahead (global maximum) that will need a leap to get there, or if we’re far away from the peak of this hill and we make big gains with small steps.

Experiments help us assess the terrain and design our leaps accordingly. Power calculations are our guide, for example they might tell us that if we want to learn the impact of our change in 2 weeks with the traffic we have we can only identify a meaningful lift of 5%. So we can therefore determine how we’ll assess the terrain accordingly. Example: We have 2 weeks: will this background change give us the 5% uplift we need? Experiment power is not an arbitrary decision like other parts of the process; this statistical analysis is required to design effective experiments. Dan McKinley created an intuitive online calculator that we have adopted as standard.

 

Pitfall 7: Failing Ideas Not Experiments

You need to make an important distinction. A failed experiment is not a failed idea; it is failing only the concept in its current condition. Failed experiments are not necessarily dead ends. You can learn a lot about your users’ behaviour even if they didn’t result in a positive change to the metrics.  A modification could turn your idea around. Small changes matter unless you are at a local maximum. Explore negative and neutral experiment data to help inform further iterations of the concept, or new hypotheses to be trialled. With segmentation there is a risk of false positives, so validation with new experiments is important.

If conditions change, so too could the outcome. Consider that in highly experimented products, conditions change often, and as such as you experiment your product constantly evolves. The implication of this is experiments you accept now could change what you rejected in the past. Similarly, Booking recognise some ideas could be before their time.

What’s most important is you don’t take a winning experiment implementation and make it a sacred cow never to be explored or challenged further.

 

Prove Yourself Wrong

Understanding the common pitfalls in experimentation will help you get closer to determining when your experiments failed in execution or when they failed because your intuition about the concept was wrong. If your experiment failed in execution, you can quickly iterate, taking what you learned the first time to improve your approach to better assess the terrain next time.

It’s important to constantly try to increase your iterative capital by making experiments cheaper. If you determine it’s your intuition that’s wrong, this can be humbling but should be celebrated. Through this “prove yourself wrong” culture, new discoveries can be made and innovation can be accelerated.

Want to read more? Check out ‘Design Like You’re Right, Test Like You’re Wrong’, a direct response to the pitfalls above.

Thanks to the many people across Skyscanner and externally who provided feedback for this article. In the latter, special appreciation to Ron Kohavi (Microsoft), Ya Xu (LinkedIn) and Ben Dressler (Spotify).


Sign up for email updates from the CodeVoyagers team


Introducing the Skyscanner Tech Engineering Blog

Posted on by Bryan Dove

Welcome to Skyscanner’s engineering blog. Thanks for taking the time to stop by.

It is early days, but you can see from our first few blogs that you should expect a wide range of articles, from technology insights to the theory behind product decisions to thought leadership. We’ve been inspired by the other tech company blogs that we as engineers love – Etsy and Netflix to name just two.

At Skyscanner we put emphasis on sharing ideas, both internally and externally. This blog is an extension of that, and so too are our community support initiatives and Technical Thought Leader Series– more on these below.

But back to our engineering blog. You might be asking – why open up this information, whether it be through world-class speakers or internal Skyscanner insights and learnings? It goes back to Skyscanner’s roots and our culture. We’re a team of entrepreneurs, and we want to share what we’ve learned along the way with the broader community, to support and inspire the next generation of entrepreneurs. We believe this is the way the tech community should work.

Our hope is that you find real insight and inspiration within this blog and within our community programme. We aim to be a unique voice amongst the cacophony of tech companies sharing what they are working on and how they think.

Here’s how:

  • We will be as open as possible, talking about both our successes and our failures. Internally we capture so much learning through our mistakes, and we want to disseminate this more broadly than just our organization
  • In addition to talking about technology, we will also share our successes and failures when it comes to scaling our company. There is a ton of information about how to scale your technology stack, but there is a dearth of information about how to scale your business. You can’t have one without the other.

We believe these two dimensions deliver a unique perspective.

Why might that interest to you? Well, we’re a global travel search company with over 50m MAUs across the world. We’re solving complex problems every single day – the challenges that existed in the sector when we were founded 10 years ago still exist. We’re on a technology journey; for example, we’ve a goal to transition 100% to AWS in under two years. It’s ambitious: Netflix did it in seven. We have our work cut out and we’ll regularly post updates and lessons, such as an upcoming article on how we make trade-offs between building solutions in EC2 vs. using AWS Lambda.

You’ll also see blogs about how we are setting up the dev environment to drive ideal productivity with Docker, ECS, and automated, progressive production deployments. On an organizational level, we’ll share with you how we’re managing the mind-set change where architecture and cost are now directly related. Blogs will be published at least once a week, so I urge you to keep coming back for fresh content and ideas. And of course, we’d love to hear your feedback and questions too.

However, our outreach doesn’t start and stop with this blog. Here are other ways we’ll be sharing insights and knowledge.

 

Open Source Contributions

We’ve recently opened an engineering hub in London, where we’re very excited to expand our community initiatives both locally and across the world. These include our regular Open Source contributions (check us out at Github.com/Skyscanner).

Our philosophy on Open Source is that we want to be open by default, as much as is possible. We will continue to publish on Github, and expect to accelerate our pace of publishing over the coming year. At present, we’d love your comments on Dixie, our Open Source iOS testing framework.

I mentioned our migration to AWS earlier. We expect to encounter a number of new technologies that we’ll have to create or contribute back to enable Skyscanner, as a globally and technologically diversified company, to successfully migrate 100% of its operations to AWS.

 

Community Meetups and Sponsorship

We’re keen supporters of the local engineering community surrounding our nine other offices across the world and we’ll continue this approach as we integrate into the tech community in London. This may take many forms, and includes sponsoring local meetups with a combination of funding and hosting in our office space. This blog will be the prime area to get visibility of these – we’ll give as much advance warning as possible, as we know you’re a busy lot. With that in mind, we’ll also post recordings of those talks, when appropriate, so a wider audience may access them.

Increasing diversity in technology is critical. We want to do everything we can to increase diversity across the industry. To this end, we’re actively sponsoring programmes focussed on increasing the number of women working in technology, like the brilliant RailsGirls. Our support doesn’t stop there, so if there are other organizations with similar goals looking for partners, please get in contact with us.

 

Technical Thought Leaders Series

We’re also launching, initially in London, our Technical Thought Leader Series (or the TTL Series for short). This is a series of talks focused on bringing top tier speakers to share knowledge with our active start-up communities. I hope to meet many of our neighbours and fellow community members at these events.

The start-up scene in Europe has exploded in recent years and London is one of the epicentres of this movement. The TTL Series will bring a unique set of presenters to London to share their content, for free. Topics will be diverse, from growing and scaling a start-up to technology to engineering leadership to business leadership. There’s no focus on any specific format; we’ll move between panel discussions, open Q&A and presentations as appropriate.

We’ll also record and publish the TTL sessions here to ensure that this knowledge is preserved and is available to the community outside across the world.

Our first is on November 23rd, where Sir Michael Moritz, chairman of Sequoia Capital and board member at Skyscanner, will speak.

Thank you for taking the time to read this, and I hope that you’re as excited as we are at our forthcoming activities. This is the first of many blogs that will provide insight into what we’re doing and why. The Skyscanner engineering team hope that by providing a unique level of transparency we are adding value to the community and that others can learn from our successes and failures to help accelerate their own journey.

We are always looking for feedback on ways to improve and content to include. Please reach out at codevoyagersblog@skyscanner.com.

 

-Bryan

 


Sign up for email updates from the CodeVoyagers team


Dixie: turning chaos to your advantage

Posted on by Balint Orosz

What do you do, if:
… your app crashes with unreliable networks?
… you want to make sure your app can withstand edge-cases?

We’ve got a simple formula: create a chaos generator to simulate worst-case scenarios, attempt to break your app and try to cause it to fail. And if you’re successful? Congratulations: modify your code, increase your app’s fault tolerance, and repeat.

We call it Dixie. It’s an open-source project to help developers find an effective answer to stability issues and worst-case scenarios, and we’ve shared it on GitHub.

Interested? Here’s how Dixie came to life.

dixie small

The Problem

We all know that today’s development teams have to create increasingly complex software in reduced time-frames. It’s no different here at Skyscanner, where we create mobile apps across multiple platforms. We, like many of you, believe that being able to react immediately to constantly changing requirements without sacrificing the perfect user experience is a key element in the product development cycle.

As our mobile app team continues to expand, it’s ever-more difficult for one developer to understand the entire codebase and visualise the impact a new modification could cause. With this comes the risk of unexpected side effects and mysterious crashes. We recognized that what we really needed to do was build apps that can handle any unexpected situation, even those developers don’t tend to expect during the initial creation process.

We came up with a few possible solutions. A high level of unit test coverage (approaching 100%) on a code base wasn’t a bad shout, but with a side-effect of greatly reduced reaction time, there was too much potential damage to the development cycle (plus, 100% code coverage can, in some cases, be almost impossible to achieve). We also considered identifying the most critical parts of the code and testing these incredibly thoroughly, which has the up-side of adding huge value to development efforts. However, the final solution came from Peter Adam Wiesner (lead iOS developer in Skyscanner) who was inspired by an article about something called Chaos Monkey, created by Netflix backend developers.

Not familiar with Chaos Monkey?

Chaos Monkey is a service which identifies groups of systems and randomly terminates one of the systems in a group.

Basically, Chaos Monkey allows developers to attack their own system to an extent which helps highlight potential weaknesses. Knowing about, and reacting to, these weaknesses helps increase long-term quality and builds confidence in the stability of the system.

Dixie: the Solution

We thought a tool like this could be just as useful for our existing projects. However, rather than a system of servers, our tool targets code components and modifies their behaviour.

Consider first that an application written in an object-oriented language can be visualized as similar to a network of servers communicating with each other. Just as a server can go down, a component of code can also start behaving incorrectly, which can then affect all the other components which are reliant upon it. This component could have all kinds of responsibilities, such as handling network communication, providing location information, managing user data or loading files from the file system.

A generally acceptable result in this scenario would be for the system to degrade gracefully, and recover from the error while minimizing the amount of harm to user experience. Ideally, the system would not continue to propagate errors, and should certainly not crash completely.

This is where Dixie comes in. Like Chaos Monkey, it can be thought of as a chaos generator, which can cause specified components to function differently and help simulate worst-case scenarios. A developer using this tool can deliberately attempt to break the app and cause it to fail. If they are successful, and the app does not handle the breakage gracefully, then it is a clear sign to the developer that the code requires modifications to increase its fault tolerance.

The idea of changing an objects behaviour is not new; developers are already using mocking libraries in unit tests. These libraries help to gain control over the dependencies of the tested components. Most of the libraries focusing on mocking instances, therefore they require the target component to use its dependencies as injected objects (ie: provide interface where they can be set or be ready to be used with IoC libraries). A well-designed architecture supports all of these above, although the testing of application with higher complexity can still be a problem. Writing higher abstractions of unit tests (integration, systems), requires more and more work to assemble the correct environment.

Instead, Dixie takes a different approach, by allowing changes to the behaviour of interior components. By applying some chaos in the method of some objects, the program flow can be changed to present (edge) cases and allows for them to be distributed across multiple components, testing their robustness. For a concrete implementation we choose the Objective-C language, where replacing behaviours is easier due to its runtime. Instead of using NSProxy object (which would also require injectable dependencies), we choose to work with the technique of method swizzling.

Method swizzling is based on calling the correct run time APIs to replace a method’s implementation with a new implementation. Working with this API requires the developer to be familiar with low-level C methods, object representations and provide the correct method environment information. Dixie takes care all of these and hides the low level logics, so the developers can focus on creating new configurations.

The developer can specify the objects and the methods that should be changed and chooses how they wish them to be changed. This creates the ‘profile’. This profile can then be applied in a single line of code, which will cause Dixie to rewire the internal structure of the application. Changes can be reverted at any time, which gives developers a control over how and where they choose to apply Dixie.

The possibilities are limitless; Dixie allows you to create your own tools, from the simplest custom profiles to complex patterns of behaviours. We’ve created an example app to demonstrate how easily a developer might implement chaos:
• Altering the GPS coordinates returned by the operating system (Location example app)
• Altering dates or times returned by the operating system (Date example app)
• Changing network response (Network example app)

Or, why not use Dixie to:

• Replace localization strings to test label truncations without polluting production code
• Simulate properties in your data objects
• Change operation system informations like battery level and device hardware information

What now?

The first version of Dixie was implemented back in October 2014, with a second version released this summer by Skyscanner’s Budapest team (Peter Adam Wiesner, Zsolt Varnai, Phillip Wheatley, Tamas Flamich, Zsombor Fuszenecker, Csaba Szabo). What’s different? Well, we’ve cut unstable proof of concept parts from the codebase, in addition to going through every source file to refactor and clean them, making them more usable for the community.

As we focused on the essentials, this means that Dixie currently only supports replacement of methods that expect objects as parameters and either return object or void. In the future, we want to add support for primitive types too. There is plenty to implement both horizontally (extending the current tool) and vertically (implementing new tools) in the long term.

Here’s what might be next:
• implementing a unit test tool that can do fuzzing on the specific method of a class (or all), detecting input assertions and creating unit test for the failed cases
• undertaking code analysis to find weak spots in method and automatically suggest behaviour changes
• detecting application dependency graph runtime and using this information to create more efficient chaos

We hope that Dixie can help in solving complex issues in a much more productive and effective way. You can find Dixie here— let us know what you think in the comments below, and get involved over at GitHub.


Sign up for email updates from the CodeVoyagers team


The bots are coming:​ Conversational UI and introducing the Skyscanner Telegram Bot

Posted on by Richard Keen

What’s the theme running across Siri​, Cortana, Alexa/Echo, Slackbot, Native, Operator and Facebook’s M? With varying degrees of success these are all attempts to introduce “conversational user interfaces” as a new and distinct interaction method, primarily but not exclusively on mobile devices.

Some of these services are purely algorithmic whilst others are human powered (at least for now). Most, whether algorithmic or not, attempt to fulfil the role of personal assistant. Conversational interactions are familiar to all users from messaging, and of course in APAC there has been a trend of messaging apps such as WeChat developing in to platforms with rich and diverse functionality, such as booking a restaurant table or a cinema ticket.

oas@2x
(image from Dan Grover’s blog)

The esteemed mobile analyst Benedict Evans believes messaging is the new app platform: “[In WeChat, in China] You can send money, order a cab, book a restaurant or track and manage an ecommerce order, all within one social app. So, like the web, you don’t need to install new apps to access these services, but, unlike the web, they can also use push and messaging and social to spread.”

Old: all software expands until it includes messaging New: all messaging expands until it includes software
— Benedict Evans (@BenedictEvans) March 13, 2015

Similarly Nir Eyal recently wrote about his experience with conversational assistants and why ‘Assistant-As-App’ Might Be the Next Big Tech Trend.

Looking back there are of course precursors to these conversational interfaces: IRC bots and the much-maligned Office Clippy of the 90s, SMS text message keyword response services in the early 2000s and a fad for Twitter bots around 2007. So what’s changed since the days of Clippy and texting a code to get Crazy Frog?

I believe a confluence of technological progress and user behaviour changes point toward messaging and conversational UI being the next big wave:

  • Our mobile devices can infer and learn significant amounts of contextual information from platform APIs and sensors (location, activity, calendar, even mood).
  • Voice input has matured and is widely accepted and used.
  • Natural language parsing, particularly term extraction, has made significant leaps.
  • Messaging, both person to person and within groups is the defacto mobile experience.
  • Sharing of rich content and collaboration on tasks (for example researching and booking a holiday together) still has a high degree of friction. Messaging conversations feel like the right place to solve this.
  • Nuggets of content in the form of “cards” are ubiquitous and fit naturally in conversational responses.
  • Users are increasingly adept at tuning their “natural” input when they realise they are “talking to a machine”.

There are however significant and interesting challenges in how one handles user input and presents useful information to users in a conversational form.

Introducing a Telegram Bot for Skyscanner

As a first step to help explore and understand the conversational medium, I have hacked together a Skyscanner “bot” for the Telegram messaging service. The bot currently provides a rudimentary hotels search based on our B2B API.

Telegram has over 60 million monthly active users on their platform and has been growing rapidly in recent months. They recently introduced a bots API enabling developers to create conversational UIs for both one-to-one and group chats. The API has some interesting and innovative capabilities such as enabling custom response keyboards for specific conversation states (for example ‘Yes’ and ‘No’ buttons).

Here’s an example conversation with the Skyscanner bot. You can view a slower non-GIF version here.

Skyscanner Bot

Whilst the functionality and usefulness of the bot are very limited at this stage, it’s interesting to consider a few advantages messaging provides: extremely low data usage, nothing to install (assuming you use a supported messaging service), excellent poor network connectivity handing, cross-platform by default, push notification of updates and group communication.

What can the bot do?
​While far from all-encompassing, the bot can:

  • Search for available hotels at a particular location, on a specific day
  • Start a search with /hotels (a slash indicates an action to Telegram bots)
  • Allows you to change and refine your search by providing a new location or date after the initial results are sent
  • Understands some natural language date entry forms in addition to full dates, for example ‘Next weekend’, ‘Next Friday’, ‘September’ or ‘January 1st 2016’
  • Accepts feedback with /feedback

How to try the bot and give feedback

You can give the bot a try by following this link​ or starting a Telegram conversation with the Skyscanner_Bot user (the one with a Skyscanner icon).

Telegram is available for iOS, Android, Windows Phone, Mac, Windows, Linux and Web at: https://telegram.org.

Please note, all messages you send to the bot will be logged and stored so that we can learn from people’s usage.

Feedback is very much welcomed and appreciated: the bot accepts feedback with the /feedback command.​


Sign up for email updates from the CodeVoyagers team