Are you a Polyglot Technologist?

Posted on by Richard Lennox

web programming concept

Originally posted on Feb 26, 2015 on medium.com at https://medium.com/@richardlennox/are-you-a-polyglot-technologist-fccd767bd421#.2jmjyoece

As software engineers and architects, we solve problems. We don’t just write code. The problems we solve are to improve the product that we offer our users. Effective system architecture is finding the right balance between those users’ objectives for the system, the technology applied to deliver the solution and the people building and operating the system. The necessary design decisions require a detachment from the technology we use. To that end we are less blinkered about our technology choices and must focus on the right approach to deliver the most compelling systems.

It is always interesting to hear general debates on technology or programming language choice flowing through wider communities, both here within Skyscanner and beyond in the industry as a whole. Some engineers are nervous and raise questions as to what it may mean to them as specialists as our technology continues to evolve. In my experience, this conversation happens many times over any given period, something we experience several times over our careers. Generally it is healthy.

This debate, though, is sometimes, if not often, seen as a competition between cool and uncool, modern versus old. The premise of that argument is wrong. The discussion shouldn’t be wasted on the technology or specific language of choice — these are simply the tools we use in the application of our solutions. As an engineer the discussion should primarily be about the fundamental engineering skills with the technology/language/framework relegated to a secondary consideration. In fact, most of us have been polyglot technologists for longer than we can remember, we simply don’t think about it in that way. We do all kinds of things with C#, Python or Java, with Rabbit MQ, MSMQ, Kafka or Couchbase, with SQL Server or MySQL, with Selenium, load balancing or CDN configurations. The list of the wide range of technologies we are exposed to and master is not simply one specific technology or language. We need to recognise that this takes a certain collection of skills. It means we are already polyglot technologists. We need to recognise it and use it to our advantage to effectively enable the delivery of our solutions.

From Monoglot to Polyglot

Prior to many years of focus on .Net technologies and C# particularly, I was a Java developer — doing part time support for a small application that was not doing an awful lot for anybody. I found a start-up web development company who was moving forwards with .Net 1.1 technology stack and had built a SaaS CMS on it. They were partnering with another start-up, a hotel gift voucher company, to build a SaaS e-commerce application for them and their hotel clients. Sharing the costs, the two companies were looking for a ‘cheap’ graduate in order to progress the development of that application. The opportunity the role brought about — the chance to have real influence as an early employee in not one, but two internet and web focussed start-ups — was exciting (and, many years later, that same decision process led me to joining Skyscanner). The Microsoft stack was the technical direction of that company. I had a few months of limited professional Java experience, so at the time the decision to re-train into .Net was a relatively easy one. I was also a recent graduate and had that graduate’s bulletproof belief in my unbeknown, complete lack of abilities, such that I didn’t get the new language fear you sometimes expect.

I bought a book (it was 10 years ago!) to get up to speed with the basics of ASP.Net before the interview process. I remember the interview exercise I had to do — a small application with contact address book of suppliers and some sales. I built it in Java with a Spring based UI. It was simple enough and while I did consider trying it in C# & ASP.net, I didn’t want to risk it. Luckily it was good enough. While training was laid on — videos, exercises, books etc. — I hit the application code to see if I could start to make sense of what was there and didn’t really look at the support materials again. The switch to C# from Java turned out to be a few minor syntax differences. I hadn’t intended to do anything serious but in that first week I found myself doing a bunch of small bug fixes. These went to production manually inside the first month and things snowballed from there. In that first month, I learned one of the most important software engineering and internet economy lessons that I try to apply every day:

This Internet Economy moves so fast, it’s impossible to keep up to date on everything. You are doing very well if you just keep moving forward. And it is forward motion that will lead to making a significant impact.

This first software engineering job was also my first foray into mastering a wide range of skills and being, what I now see as, a polyglot technologist. I was programming in C# mainly, but equally I was designing databases and developing complex SQL queries and stored procedures for sales reporting and forecasting. I was doing overall system design and architecture. I was coding Javascript, HTML*, CSS, XML based build scripts, DOS deployment scripts and doing web server administration, database server administration, traffic predicting and capacity planning. I started Unit testing and wrote some basic automated integration tests while designing the manual regression test scripts as well as basic deployment automation. All of which added to my baseline software development skill set.

It is these fundamental skills and the ability to keep moving them forward that I believe is at the core of any good software engineer. Good software creation follows tried and tested paradigms openly but not blindly. Object-orientation or functional programming, tiered architectures, SOLID principles, baked in quality, simple use of design patterns and more recent approaches like Continuous Delivery — these fundamentals have been around a long time. You can find them referenced in the seminal software engineering books such as Code Compete, Clean Code and The Pragmatic Programmer and none are technology specific. New patterns appear over time, but those that cross over the technology boundaries are the key foundations on which to base our decisions not the technology stack. Understanding how they are the key transferable skills between languages should give you the confidence to at least consider the step beyond your current platform of speciality.

While having the fundamentals skills is the first key to effective polyglot-ism, having the right working environment is also key. Working in an environment with multiple technologies allows you to constantly try something new. It gives you the freedom to leverage those skills and learn something you didn’t know yesterday. A software company that is so focussed on Ruby on Rails and nothing else is blinkered and always provides a rails-oriented solution. The team never gets the chance to exploit the alternative options despite the degree of chance that there is something more suited. Having a mind-set that supports adoption of the right tool for the right job — irrespective of underlying technology — enables better design and better solutions. The environment to make that choice has to exist. It can’t and should not be a free for all — for reasons of on-going support, maintenance and accidental costs — but a freedom to experiment should be the norm.

The language itself is always a secondary consideration. If you get the foundations right, you should have every confidence in your ability to pick up any new language or technology, understand its core principles and move on with it. Often at times it may be a different paradigm or a new approach (functional or parallelism paradigms) but at its core the fundamentals stand us in good stead for whatever is next. If as engineers we get this at the fundamental level, we can have the right approach to problem solving without constraints.

I am still most familiar with the .Net technology stack, the result of a decade of the environment and the language and seeing it change over time. It can become habitual. I am lucky enough that today there is a growing variety of choices I can make. I am more likely to try out a new technology or language; most recently GO, while Objective-C is next on my list.

I have always been a polyglot technologist. Haven’t you?

In this internet based economy every Software Engineer is a polyglot by necessity. We’re required to switch technologies efficiently, relying on our fundamental engineering skill-set to progress. That’s not to say we should all be generalists, specialists are also needed, but the core and fundamental skills of a specialist can be applied in many technologies. Since joining Skyscanner I continue to work with many different technology stacks. I have enhanced my Python skills. I have moved my Javascript skills forward. I understand more about web applications at scale — load balancing, CDN technology, cloud etc. I am constantly adapting and building on the core skillset. I am not just a .Net engineer. I never really have been. I have always been a polyglot technologist. Haven’t you?

* I can hear the scoffs at seeing HTML in this. Creating clean semantic HTML underpins web applications both for its usability when combined with high quality CSS and JS but also the art of Accessiblity, SEO and performance. It takes more than a little bit of craftsmanship too. If you would build a definition list with a bunch of <div> tags — then perhaps you should consider it further?


Sign up for email updates from the CodeVoyagers team


Reaching altitude by being mobile first

Posted on by David Low

At the recent conference, I presented on the topic ‘Reaching Altitude’,  about Skyscanner’s journey from being a mostly desktop business to being ‘mobile first’  and ultimately starting to embed our services in other products.

Essentially though, the talk was about making good flexible preparations for the future ; in the modern digital age, change is sometimes hours around the corner. Businesses simply can’t afford to be rebuilding and chasing all the time.

Skyscanner and Mobile First

A while ago now, we asserted ourselves as being a product led, mobile first business. That’s not a strategy in itself, but some basic assumptions and accelerators that underpin day-to-day and longer term development.

From making our internal networks available as simulated 2G and 3G experiences, to ensuring all staff have access to a wide array of devices, we try very hard to make sure everyone is up to date.

mobile

It’s important to understand what we mean by ‘mobile first’ though. For us it’s partly a behaviour shift to think differently about our users and how they want to use digital products like Skyscanner.

But it’s also about architecture, engineering and product development — and how to prepare yourself for a new world every few months.

In 2014 alone, Skyscanner saw a 77% growth in usage of our products on mobile devices — not a particularly new trend, but one we spotted early enough to act upon — and one that isn’t going away.

But in many of the markets we operate, mobile-first behaviour is almost the default. Indeed,  some markets are already bordering on mobile-only, with consumer desktop usage and fixed-line connections almost unheard of in some emerging markets.

In Korea for example, 74% of users prefer to use the Internet on a mobile device. They may use it on other devices but the preference is clearly there.

Overlay on that, a high percentage of app usage and even then, a high percentage of ‘platform’ products like Kakao Talk — where you’re almost talking about using apps within apps — desktop is struggling to get a look-in.

Indian e-retailer Myntra dumped its desktop site last May in favour of being ‘mobile only’. Most observers expected a drop in revenue as a result, given there was still a chunk of desktop spending going on — but in fact, early indications are of a rise.

The mobile product had gone through various enhancements to make it more mobile friendly, tempting those users into longer sessions and more conversions to buy — and ultimately this enhanced experience tempted desktop users to migrate and stay.

And yet in that same market, a large proportion of new internet users are on 2G or similar connections, to the extent that Internet giants such as Facebook and Google are adjusting their products to suit. Facebook Lite, a thinned-down Android app, and Google Web Light, a web optimiser, play directly into this expanding market. Without these products the giants would not reach the sort of scale they’re used to elsewhere.

What is Mobile?

Having looked at all those things, what gave us most to ponder in going ‘mobile first’, was trying to define what we meant by it. What is mobile?

Take three scenarios — imagine three people on a train. The first is sitting looking at a phone in their hand — almost the dictionary definition of ‘mobile’, and the obvious example.

Next to them is someone using a laptop. Clearly they’re not on a ‘mobile’ but they will be suffering from limited, poor bandwidth, tunnels and other blackspots. Their experience will be as impaired as anyone else.

Next to them is someone without any obvious devices. But who are we to say they don’t have a wearable on their wrist, a phone in their pocket buzzing away, or that they’re waiting on a notification of some kind? ‘Mobile doesn’t just constitute active use.

Even if we did narrow it down to phone, what do we mean by that? Is it someone simply browsing mobile web? Or using an app? Are they deep inside a messaging app like WeChat — something almost a default behaviour in the Far East?

As application developers or content publishers, should we even care?

In truth the answer to ‘what is mobile’ is: ‘all of the above’. But in most of the examples above, we’re not talking about devices or platforms or applications, we’re talking about context. And that theory around contexts has been defined as the ‘mobile mind shift’.

In a very good book which dives into this topic, The Mobile Mind Shift, Ted Schadler, Josh Bernoff and Julie Ask start to describe (in example-led detail), what that mind shift is all about and how to address it. They define it here:

“The mobile mind shift is ‘the expectation that I can get what I want in my immediate context in my moment of need’.”

What they’re trying to explain is the ‘mobile moment’ — that point in time where provision and need just happen to meet, by matching together some good data points and contexts rather than pot luck.

To use an example closer to home — Skyscanner Google Now cards provide flight price changes to users, when Google believes they’re most likely to be actioned.

Google’s knowledge of the user gives us deep context we can’t know, the benefit to us is learning about those conversion times, how valid an alert is at a given point in time, and what types of messaging work best for each user.

Thinking about the end points that suit those different ‘mobile moments’ is critical.

Degrade to Desktop

Companies that simply take their web experience and replicate it on phone, even if they use accepted techniques like responsive design, aren’t quite getting that principle of context. Making a big thing smaller doesn’t cut it any more.

Adaptive development — only showing blocks of content relevant to the device or context — goes a long way towards fixing that, but as we’ll come to later on, it doesn’t answer some of the micro-contexts and non-visual ways people are starting to get used to. Nor does it necessarily answer being an ‘app within an app’ such as Kakao or other Asian experiences depend on.

If you think of a modern smartphone and the mixtures of touch, geography, movement — it’s both the hardest one to get right, and the most rewarding at the same time. From there, you effectively drop capabilities as you head backwards desktop sized experiences — and the visual design isn’t really that important if the user doesn’t get what they wanted along the way.

degrade-to-desktop
It’s almost progressive enhancement in reverse, and something we’ve been calling ‘degrade to desktop’.

And it’s not an easy thing to get right ,  but by thinking about small blocks relevant to particular contexts ,  and only building and serving those blocks at the right time, you give yourself an efficient way to meet these emerging challenges.

A principle I call ‘leaning on the platform’ means you’re only building the bits you actually need for your own purpose and context ,  built on a solid underlying platform and data structure. There aren’t that many new problems to solve so why solve them again?

hierarchy

It’s a fact that being featured in an app store is mostly dependent on this methodology — it’s both efficient as a development and quality-assurance process, but also showcases the power of the core platform, which is crucial exposure for handset / OS manufacturers.

Growing big by thinking small

From a flights perspective,  maybe someone doesn’t want a flight price, maybe they just want to know ‘who flies’ to somewhere. Or whether they can fly somewhere at all.

These are small queries that can be delivered by microservices to fragmented clients. Ideal building blocks for modern low-friction products — voice, messaging, wearables or IoT. And all delivered from a platform that still serves legacy and existing needs without the need for monolithic systems.

Having got this far, you should be able to take your concept of a mobile moment, and deliver it almost anywhere — it’s now about the nitty gritty.

The good people at Forrester Research, describe how to enable all this through a ‘four-tier engagement platform’, which puts it very well:

platform

If you think about it that way, previous responses to mobile such as m.dot sites, responsive and adaptive, are lucky if they deliver one or two of those tiers.

You need to work your way back from the mobile moment and that fragment of expectation, all the way down your application structure — to the point you can deliver that moment’s worth of information quickly and efficiently.

By starting small, you’ll find you can scale to the billions of devices and endpoints you can address.

A world of possibilities

This attitude towards development also opens up those other platforms mentioned earlier — voice, wearables and IoT — and also embedding yourself in the wider Internet.

Opening up these systems, platforms and fragments for people to adopt into other products and services, allows you to take those good foundations and scale way beyond what you might achieve on your own.

At Skyscanner we’ve adapted our platform from things as simple as presenting price changes within Google Now to the more complicated integrating of our flights API into Amazon’s Alexa. We’ve created an Apple watch app that directs you back to your hotel in a strange place, and, at the bigger end of the scale, created full blown travel search for MSN users.

msn

Lots of variations, big and small , all  based on the same underlying platform, and a variation on that four-tier architecture. Hopefully there will be a lot more to come from us with those foundations in place, and through Skyscanner for Business we’re using those foundations to do some exciting things. Watch this space.

That, amongst many other things, is part of how we plan to ‘reach altitude’ at Skyscanner.


Sign up for email updates from the CodeVoyagers team


The case for open access to APIs and tools to drive tech progress

Posted on by Filip Filipov

Every company, in fact, every great company, started small. Those great companies start with an idea from a founder or two, in a garage, with sleepless nights and a notion of building something a couple of bright individuals truly believe in. Often, such notions originally seem like an outrageous idea with slim prospects of success.

There are hundreds, if not thousands, of examples when that was the case and, contrary to popular belief, the tenacity of the founders and the team led to something great — be it Apple, Facebook, Microsoft, Google — all companies that not only touch the lives of billions of people, but also make them more productive and successful.

For some companies, the chances of success are directly proportional to the market fit, the hours they put in the build of their products, and eventually marketing to acquire customers. Many of aforementioned companies do not depend on external sources, they are more or less self-sufficient; their products’ existence is not decided by a source of incoming information, which is typically controlled by monopoly players or requires juggling hundreds of relationships to ensure all information is readily available, accurate, and cheap (if not free) to acquire.

While tech progress has been outstanding in the last decade, there are still a number of industries that haven’t reached even a fraction of their potential due to the lack of available information or restricted access. Without listing all of them, some spring immediately to mind: travel, utilities, music, and healthcare — to name a few. Simply put, a developer with an outstanding idea for a product or feature in one of these areas is often not able to execute on it, because they don’t have the resources to access the vital information they need to make their product work.

These industries are large ones with huge markets, lagging in innovation and using methods and approaches that are deeply outdated. They are all sectors with either high fragmentation (hence, hard to collect the data to ensure coverage) or big gatekeepers, unwilling to share their information with the new up and comers — either because they are not important (too small) or they are a threat (too innovative).

Even if there is a way to access the information, it is so price prohibitive that a bright-eyed university or college student can barely afford to sign up for a day of access, not to mention the months/years that are needed to test, validate, and improve on their ideas and start generating revenue.

Imagine if Amazon needed to pay for every book search they do to the publishing houses? This would have made any minimum viable product a theoretical exercise, rather than a practical one. A similar case is Google; if any link to a site was either restricted or cost 1/10th cent for every search done, at 3.5 billion searches per day, that’s $3.5 million for the cost of data. Even at today’s valuations and influx of cash, that would be an impossible task for any start-up.

At Skyscanner, our CEO has this favorite quote: ‘travel is the field of broken entrepreneurial dreams.’ And it is not based on the idea that engineers do not have great ideas for how to make things better, but instead that to get to product market fit at scale, you first need to have access to data, which in travel is either majorly restricted or incredibly expensive.

At Skyscanner for Business, we have taken the stance that we will make our data available and in the cases where it is not economically feasible to do so at scale, at least provide enough of it that will allow any start-up to get going, learn, iterate, and eventually build something that will make it sustainable*. We hope that other companies will do the same and there are some great examples out there already. We just need more of them.

The next iteration and step-change will come from the new, from the different, from the ones coding in their dorm room (Facebook) or assembling machines in a garage (Apple). But instead of leaving that to pure chance, we need to open up the gates and support these efforts, to encourage innovation from entrants.

With that, we can encourage new ideas, an ecosystem of companies that can grow with it, and at least a bit of innovation. Maybe the two guys building that app from their living room have the revolutionary design and interaction that redefines how the experience should be, ultimately benefitting customers and our businesses by showing us the way.

As Ed Catmull quotes from Pixar’s Ratatoullie: “[T]he world is often unkind to new talent, new creations. The new needs friends.”

*Interested? Visit Skyscanner for Business, reach out to us through our LinkedIn page or find us on Twitter @Skyscannertools


Sign up for email updates from the CodeVoyagers team


Logging at Skyscanner: building and using a real self-serve data platform at scale

Posted on by Scott Krueger

I recently had the privilege of presenting a talk to an International Big Data Audience in beautiful Budapest, Hungary. My crunchconf talk, titled ‘Logging@Skyscanner – a dreamer’s guide to building and using a real self-service data platform at scale’, explored the motivations, research process, cultural impact and implementation details that got us into operating an infinitely scalable data platform.

It’s important we all share our experiences of implementing distributed data platforms.

What follows here is a summary of the steps we took at Skyscanner to do this, and the outcomes that followed.


How does one know to change direction?

For us this was pretty easy.  We started with a single monolithic RDBMS that performed all kinds of tasks. The logging database was a denormalized 4-table DB that handled all of reads and writes. By tuning the hell out of it we managed to get a lot of mileage from it. However, the warning signs were getting pretty obvious by the end.

The signs

  • Writing OR reading doesn’t perform
  • Writing AND reading don’t agree with each other, a.k.a. contention (which in relational models can happen at various levels internally – make sure you have someone who understands the internals to hand)
  • Writing OR reading can’t be done in a timely fashion anymore, probably due to contention

Trap 1:

Solution: Just scale out the write nodes, and batch back periodically to a read node.

Problem: If your business is growing, you have more event volume you are trying to stuff into a table in the same time frame. You are prolonging the inevitable.  You’re back at the beginning.

Trap 2:

Solution: Denormalise everything. This is really effective and really does get you away from most of the problems associated with the above. BUT, it has some undesirable side effects that I hope you’ll avoid by reading this.

Problems: You need to do some serious data engineering to make that data readable. ETL pipelines suddenly go from two to 20 steps. And with more steps comes more confusion, and less ability to read the data in the first place. WTF is going on in ETL step 17? I dont know, and no one else seems to either. Plus, if your business is growing, you have more event volume, so you are prolonging the inevitable. You’ve still not solved the problem, and now it is even harder to diagnose.

No more traps

We needed a logging platform that would:

  • Have a much longer best-before date (i.e. horizontal scale)
  • Give the data back to the people

So where did we start?

Everyone says it’s the journey that’s important, not the destination. ‘They’ might be right but in this case the journey AND the destination are equally as important. Create a vision, do your research and make a plan.

Destination = Scale-Out-and-Give-the-data-back-to-the-people land

Journey = This process, captured forever here

It’s confusing out there in the real world and here’s why.

Check out these ‘Big Data’ headlines:

The 11 Trillion Internet Of Things, Big Data and Pattern Of Life (POL) Analytics”{POL? Tell-your-mom speak: Machines are watching and predicting your next movements}

Amazon Web Services gets serious about big data analytics with bevy of new services. The tech titan’s cloud arm, Amazon Web Services, is beefing up its suite of streaming data and analytics services.”

Enterprise-software’s Trillion Dollar Opportunity


What now?

Identify your business need.

Definitions

Business Need: The sensible pairing of technical want with a legitimate business reason to do this.

Technical want: Engineers love solving problems and we also like to over-do it. It’s part of our fabric. So what we think we need might not match the business reason.

Business reason: The destination.

Research

What are others doing? Build or Buy? How does it impact my existing systems and processes?

Take the time to read-up and see what your peers and idols are up to.

My inspiration came from Jay Krep’s deservedly wordy “What every software engineer should know about real time data’s unifying abstraction“.

“One pipeline to rule them all”. That’s it – I’m sold. Point all of your data producers to one place and send your data consumers to that same place. Sounds simple right? It is. And that’s why it’s so powerful. Value will appear in places you don’t expect it to.

At the time of our research, we needed to plan for data centre and cloud capabilities. For this, and many other compelling reasons as stated in Jay Krep’s article, we chose Apache Kafka which was (and still is) one of the best data technology decisions we have ever made.

I can’t speak highly enough about Kafka. I’ll resist and defer more praise to a future article.

With Kafka comes Zookeeper – your trusty friend to manage the distribution of a Kafka Cluster. Provided you dedicate your Zookeeper to Kafka and follow the operational guidance, it just sits there and does what it’s good at.


What would a data platform be like without doing something with the data?

Enter Stream Processing, Data Transport, Analysis, Visualization and Slower Data.

Connectivity and inter-operability is getting really good these days. You have a lot of choice here. Choose something you know will work with your stack with as little effort as possible. If you were starting from scratch, and were entirely cloud hosting, you’d do no wrong by looking at some cloud provider offerings in this territory who are waking up pretty quickly.

Stream Processing: we chose Apache Samza because it connects very well with Kafka and has some really brilliant capabilities that are easily-recognizable to any functional programmer and/or SQL developer. Why not Spark-Streaming? At the time it was getting kind of hot, but has since fizzled out. There are some other good options out there too – and it should come as no surprise that a few major cloud providers are also hot on the heels.

Data Transportation: Logstash. Config driven Data in -> Data out. e.g. application log -> Kafka Topic; Kafka Topic -> Your favourite DB for analysis.

Analysis: As keen elastic users (formerly elasticsearch for you oldies), this was good enough for the purpose of doing some analysis and seeing the data. We hit it hard from the get-go.

Visualization: Elastic Logstash Kibana (“ELK”) – don’t leave out the “K”, you’re 2/3’s there. Fire up :9334 and click-away. One-hour learning the basic tricks and you’re good.

Slower Data: Archive all of your data to cloud storage (like S3). No question about it. The “what will we do with it” debates you had 10 years ago are gone. Wake up – storage is cheap and if you don’t have an army of talented data people to tell you what to do with it, go and find them. And when you have them, listen carefully to what they say. Your business’ future relies on great data that is turned into great information by these aforementioned great people.

We make use of Spark, EMR, Tableau, Pandas, SciPy to name a few. Keep connectivity and inter-operability in mind for your platform outputs (which isn’t too difficult now due to increased openness from traditionally closed-source vendors).

platforms


Hosting Considerations

Are you in Data Centres? Are you in Cloud? Are you in both? Do your applications need data locality? Can you afford(!) some latency? Compression? Encryption? Geography? There’s a lot to think about here – so take the time to think it through.

What about my existing logging platform?

Follow these two weird tricks to keep your business running:

  1. You just have to do this in parallel. We looked for various solutions to expedite a migration but came right back to the simplest option – do it in parallel so your data consumers and business have confidence in the quality of what you are doing.
  2. Since you’re going the parallel route, figure out the easiest way to piggy back on top of your existing logging infrastructure. We listened in on a rsyslog stream of our load balancer web logs.

Schemas

It’s really really important to (re)define schemas for your events.

It’s therapeutic. Involve lots of people – particularly those who understand the data. These will be the event producers and the analysts who work with this data everyday. They know what things mean. They know what is missing, or isn’t in use anymore. Prepare yourself to go through the rounds many many times as producers and consumers continue to use the schemas.

It’s really important that you document the fields correctly, and maintain a good history of your changes and the reasons why (no squashing!).

Schemas are your first port of call for data quality. They help preserve the structure and integrity of the data.  It’s really important you have additional quality guidelines in place though. Any old value can still ‘fit’ in a schema. Bad data means bad decisions.

Design a schema work flow. We use a standard GIT flow with a central schema repository. The same group of domain experts should review merge requests before new versions are available. All versions must stay backwards compatible (a consumer must always be able to read a message, even if the structure of that data changes). Schemas are for life.

Schemas and Serialization

Choose a schema tech that’s right for your stack – particularly your consumers (since you’ll have way more of them than you will producers when all goes to plan). The trade-offs generally boil down to message compaction vs serialization speeds. There are a number of benchmarks out there.

We use protocol buffers since it meets all of our needs and is actively developed.

Research Complete?

You’ve done all your homework and you’re ready to go right? Right! Stick to your decisions and go with it. Just remember these 21st century rules:

  1. By the time you begin development, something new will have come along.
  2. By the time you are fully operational in prod, something new will have come along.

Things are moving quickly and a lot of great innovation + simplification is heading our way. What you have today will become easier to use AND more cost effective tomorrow.


Pitch It

This is the point where you’re going to have to convince whomever writes the cheques that it’s a good idea to go with the new plan.

Keep things simple and high level – they’re not interested why write order doesn’t matter and how this improves throughput and latency!

Here’s a diagram we used showing the simple approach being proposed:

diagram

Explain the value of doing this and how it meets the business needs (by identifying what the drivers are and how the technical solution meets this).

This diagram was accompanied with some easy to understand dialogue. It went something like this:

“All of our services capture events, like click activity, or application logs. We can call these services Data Producers, who ‘log’ their data to a single place that can handle A LOT of writes and reads. This central log is highly available and can easily scale out when needed. Consumers pick up these same data events to create new data sets. These data sets can then be turned into any data product we need: reports, archives, application data, data for exploratory analysis, data for machine learning, and so on.”​

Development: Proof of Concept (POC) vs Minimal Viable Product (MVP)

At this point you’re ready to develop something and put all of your good research to test. Right from the start we chose a high-volume event stream that would provide a lot of value for us. It wasn’t an experiment or a quick prototype to see if the platform would work for us. The technologies had already been proven a number of times over.

We were committed and treated it as a product that we would continue to iterate on and improve upon.

We did push to production-like environments quickly so we could start load testing and estimating cluster sizes. Which also gave us the advantage of seeing production value early on.

Demonstrating Value

As soon as you can, start playing and showing others what kind of cool stuff you can do. Here’s a kibana dashboard I created in real-time to look at some flight pricing trends:

pricingtrends

Trap:

A-ha! You are building a brand new data visualization tool!

Ummm, not quite. We’re building a unified logging platform that anyone can use to do great things with data. It took us a couple of months to realize perceptions were different!

Tip:

When you are demonstrating something, make sure you clearly explain that this is something consumers themselves can do as a result of the platform you are building. It is not ‘the platform’ you are building.


Taking it to Prod

Hopefully in this day and age you too are doing everything with the best development and operational practices in place. Continuous integration and delivery is going on; there are lots of unit / functional / regression tests running; code is well instrumented; monitoring and alerting is on;  incremental builds and quick deploys push to prod smoothly; and so on…

All of this adds up and takes time – plan for it and try your hardest not to cut any corners. If you do, come back ASAP and pay off that tech debt. It quickly piles up in distributed data platforms that by nature have more moving parts then your standard application.

technicaldebt

If you are also deploying to a cloud hosting provider, there are more things you need to do.

MVP Quality

How do you know things are as they should be? Provided you trust your old logging platform outputs, you can track the same metrics side by side because you’re taking a parallel migration approach.

We could confidently show our existing data consumers the new platform is up to scratch – stream processed numbers are looking the same.

On-boarding

You’re getting value out of the MVP so now is the time to encourage your producers and consumers to make the move.

Here are a few tips to ease the on-boarding process:

  • Not all your events have to migrate – figure out what is actually being used. Somebody knows so find them and ask them
  • Documentation: how do I write? How do I read? We use readthedocs with GIT.
  • Ask for feedback, good and bad; use to improve your documentation (the aim being self-service)
  • Your teams that are already logging mightn’t have the same passions and incentives as you have. Remind and educate continuously
  • Promote adopting the platform where work is prioritized
  • Hold a two day distraction-free workshop to focus on schema authoring, writing and reading. Stuff gets done very quickly when everyone focuses on just three things and the right people are at hand (we had a group of four representing domain expert, producer, platform and consumer)

Results

Did we meet our business need? Yes, and more value appears as more and more users adopt the platform. A snowball effect starts and great things start to appear.

indicative

Indicative prices, as a result of stream processing

kibana

New events made available for all via a unified logging platform

What’s next?

We continue to evolve our platform and actively keep up-to-date on emerging technologies and techniques.

As we are big supporters of open-source code and process, you might find our kafka ‘topic enforcer’ tool useful. It comes in handy when you’re managing a lot of topics across several clusters.

We strongly believe stream processing on top of a robust unified logging platform is the best way of handling an organization’s most important asset – its data.

 

 


Sign up for email updates from the CodeVoyagers team


What worked yesterday, is painful today, is broken tomorrow

Posted on by Richard Lennox

Over recent years we have seen British Cycling dominate the sport. They have maxed out the medal table at both the Beijing and London Olympics, as well as having two cyclists win three out of the last four Tour de France. Ten years ago this wouldn’t have happened. How? It has been achieved primarily through the adoption of the mind-set instilled by David Brailsford; one of improving absolutely everything by as little as 1%.

Looking closer to home, it is our jobs as engineers or product managers, as squads and teams, to focus on ‘winning’, and winning means continually getting better at what we do.

As Skyscanner continues to grow, the shapes and complexity of the engineering and technical challenges change and how we approach them must also adapt at the same rate. Ultimately we find that at scale, continuing to do what we did yesterday is not sufficient. The alternative is getting stuck in a broken cycle that ultimately means we cannot achieve our goals. We can be left with a feeling of wading through quicksand, slowing so that eventually we would sink. To continue growing at scale, it is necessary for us to apply a steady stream of Continuous Improvement to everything we do.

However, in a world where we are striving to maximise the delivery of value to our users and partners, stopping to improve everything is not an option – our time is equally precious. How do we make sure we are working on the right things? Skyscanner has successfully utilised the Theory of Constraints (as one of many tools), to direct our continuous improvement actions to ensure that they have the maximum possible impact.

The Theory of Constraints

Introduced in the book The Goal ,Eliyahu M. Goldratt created the ‘Theory of Constraints’ (TOC) as a way to apply Lean principles while focussing on the most impactful improvements possible.

It is a methodology for identifying the most important limiting factor (i.e. constraint) that stands in the way of achieving a goal and then systematically improving that constraint until it is no longer the limiting factor. In manufacturing, the constraint is often referred to as a ‘bottleneck’.

toc

The path of software (from Idea to IDE to Production experiment to full roll-out and maximised value) flows through a pipeline, a set of (mostly) automated steps. While we at Skyscanner favour people over processes, there is no hiding from the fact there are processes involved in everything we do. As with any manufacturing flow, these processes are subject to the Law of the Minimum, and there is always a limiting factor. After all, we do not have infinite capacity!

TOC helps us focus on reducing and removing the bottlenecks within our processes systematically. Allowing us to focus on the current limiting factor of our process, and by working on removing or reducing the bottleneck, we can be sure we are directing our efforts to make the biggest gains.

Have you ever completed a retrospective action only to realise that you have made little or no impact to the next sprint, and been left scratching your head? Perhaps it is because you optimised something that was not the root cause or constraint on the process. TOC helps us focus on getting our actions right.

Focussing on the bottlenecktoc1

This is how one can apply the most basic tenants of TOC to processes.

An action to improve capacity upstream of the bottleneck increases the pressure and produces more unfinished work (waste).

toc2

An action to improve capacity downstream of the bottleneck does nothing to reduce the pressure on the bottleneck and is wasted improvement effort.

toc3

Only by maximising the throughput through the bottleneck (exploiting it), and putting everything on the cadence of the bottleneck (subordinating to it)…

toc4

…Then increasing capacity at the bottleneck (elevating it) do we get significant, impactful improvements.

toc5

Often the ‘exploit’ and ‘subordinate’ actions alone can have significant impact. For every improvement we do we will always have a bottleneck, so we need to continually restart the process in identifying bottlenecks.

Application and results

While this simplified introduction to the Five Focussing Steps of TOC highlights how it can be used tomaximise outcomes, it is through clear results that TOC has become a mainstay in Skyscanner’s toolbox. Here are some examples of the successes we have had:

  • In Our Hotels Back-end Squad we have reduced bug throughput to < 1 week
  • Improved delivery predictability through guided process changes with zero effort
  • In our translations processes we have reduced turnaround time from upwards of 2 weeks to < 2 days and are now looking at < 24 hours
  • Tripled the velocity in areas of our B2B tribe

Overall, TOC has aided our ability to deliver on continuous improvement and adapting to the pain we feel when, due to our continued growth, our process are struggling. Some squads now operate with every other retrospective being tightly focussed on working their constraints to maximise throughput, and they’re really seeing the benefits.


Sign up for email updates from the CodeVoyagers team


I’ve seen the future of mobile

Posted on by Balint Orosz

Blog_China_Final

“We’ve just built a website but I haven’t really checked how it performs yet.”

This sentence came from one of the founders of Carousell – a Singaporean start-up I visited, who not too long ago raised $6 million from Sequoia Capital. Carousell is one of the most used applications in the region for shopping, and the statement above clearly indicated how web in most of APAC markets is just an ‘afterthought’. Why? The primary interface is mainly apps.

Talking to these guys and other start-ups during a trip to Asia proved to be an incredibly valuable experience. While we mostly know what Samsung, LG, and other big companies are doing through large tech portals, we see much less about the new, hyper growth startups which very effectively build on local habits, legal environment and scale at an exponential pace.

It’s clear to see that these companies expect exponential growth, user loyalty and ‘easy, effective payments’ through mobile; more specifically, apps. While from market research​ we would still assume mobile is too immature, in lots of Asian markets that’s no longer true.

(Local) Apps are everywhere

People use apps everytime, everywhere. When I arrived at the airport in Korea the first thing I noticed was the weird yellow screen on everyone’s phone – the KakaoTalk app. Over 90% of Korea’s population uses KakaoTalk, and from what I saw they use it as their primary way of communication. I didn’t see people using iMessage, or even SMS – it’s all KakaoTalk. Interestingly when I asked people why they don’t use Facebook Messenger, they responded with a question –  “but can Facebook Messenger do group messages?”.

One local startup was founded 1.5 years ago, and uses a model very similar to Hotels Tonight. It has 10 million downloads in Korea. That’s roughly 20% of Korea’s population. Insane.

On billboards you see ads not for products or companies but for apps – it might be for food ordering, yet another Uber clone or whatever that might be. And in most of the cases it’s likely those outside the country have never before heard about these apps. They’re mainly local apps, developed by local companies, serving local needs, in a local fashion. That might be supporting the most popular local online/mobile payment methods, or having design / UX that appeals to local users. People in most of the cases can distinguish local vs. global apps – and often have a preference for local ones.

That being said many of these companies struggle when not even going global, but outside of their home country at any level. They build rapidly with focus on execution and local needs – in lots of cases ignoring planning for scale – and only when successful on a local level, do they start to think serioulsy about expansion. At this point they face several difficulties, starting from software / architectural difficultes (technical debt), and cultural differences.

The first real personal ‘Travel Assistant’ will come from APAC

For users in APAC, mobile is what the *Personal* Computer, PCs as we know, always wanted to be, but never could achieve the status. As users can use their mobile devices for nearly everything – from communication, to shopping, food ordering, transportation etc, it becomes an extremely personal item for them. In this region ‘Mobile is Social’ and ‘Social is Mobile’. Considering that ‘Travel is Social’, it’s quite simple to see how ‘Travel is also Mobile’.

Together with the trend of deeply coupled services, instead of the de-coupling of services we can see in most Western markets, it’s clear that people expect integrated solutions which solve all of their complex needs and not just simple function stand-alone apps.

They expect to do everything in one app – browse, gain inspiration, book, and guide them through the travel – all of this in a very smart way.

The future is here

APAC is not a country, but a region. While  there are some characteristics which are very similar across most of the countries, at the same time, many are very different, meaning there is no ‘one solution fits all’ for this wide-ranging and diverse market.

Saying that, there is plenty of opportunity in APAC.  Local product development methods are not more advanced, neither is their engineering. However, the pace of execution is extremely, admirably fast. The key to their success is a focus on local needs and embedding features in this way deep in the product. Our bus-booking feature in India is an example of the way Skyscanner is looking to approach the region: tailored, localised in approach, combined with the strength of our global comprehensive flight coverage.

One of our challenges in APAC is the way in which to modify  technical and product architecture in a way which allows ‘deep localization’ – i.e. altering some features completely for given markets , and of course having product/engineering teams focusing exclusively on these markets.

What is really clear is that mobile and apps are far more ahead in APAC in terms of maturity than I had previously imagined. It shows us the future, and in that future mobile gets higher user numbers, higher frequency and generates more revenue than desktop. This future will arrive to Western markets as well, and anything we do in APAC can only help us better understand and prepare for the future in other regions too.

 


Sign up for email updates from the CodeVoyagers team


Common Pitfalls in Experimentation

Posted on by Colin McFarland

Experimentation

Through experiments we expose our ideas to empirical evaluation. Naturally, uncertainty follows, but the organisational mind-set one develops around this uncertainty is crucial.

More often than not, the things we believe will improve metrics simply don’t. When exposed to real users in controlled experiments, those no-brainer ideas we assume to be obvious often fail to improve the metrics designed for them. Far from the case studies from agencies selling their services, research from Internet Economy organisations we trust reveal the failure statistics from experimentation to be humbling.

In his book Uncontrolled, Jim Manzi revealed that at Google only  about  10%  of experiments lead  to  business  changes. It’s an arrestingly small number, but it’s  a  similar  story  at Microsoft.

In Online Experimentation at Microsoft (PDF), Ron Kohavi confides that of their experiments, only about one third were successful in improving the key metric.

In Do  It  Wrong  Quickly  we learn how ‘Netflix  considers 90%  of  what  they  try to  be  wrong’ Dan McKinley at Etsy contributed too:

“It’s been humbling to realize how rare it is for [features] to succeed on the first attempt”. 

From thousands of experiments during my time leading Experimentation at Shop Direct, intuition was wrong more often than not. At Skyscanner, early evidence suggests towards 80% of experiments fail to improve the predicted metrics.

It’s easy to see how this idea could be industry-wide and poorly recognized. If any Internet Economy business is to seriously compete, it needs to begin by abandoning assumptions and moving from a culture of deployment to a culture of learning. As we start out, we believe these features and designs to be valuable – we’re investing time and effort building them – so what’s going wrong? Why do some many fail and what can we do about some of it?

Apart from statistics around failure, little has been said about what we can do to tackle it. We can do better. If we understand where experiments go wrong, we can work to improve things. Understanding common pitfalls can help us determine why some experiments failed.

Presented here are common pitfalls in experimentation, loosely in order of impact.

 

Pitfall 1: Not Sharing or Prioritising Evidence

Another common problem I see at large scale across businesses is the learning from experiments isn’t shared widely beyond the team operating the experiment. Clearly, winning experiments should be promoted across the organisation. When we fail more often than not, it’s easier to consider how success can improve if we understand winning experiments in one domain so that we can repeat them in another.

When experiments fail it isn’t as straightforward. Hindsight can lead us to realise some experiments were executed poorly. The data from that experiment could have limited or no value to others. Perhaps it’s invalid. Far more important is sharing surprising failures — those ideas that many people would think will be successful. If your users have rejected a feature or idea and it surprised you, it will surprise others too. Sharing these surprising results is a wonderful opportunity.

You can save others repeating the same experiment; they may adapt their method based on your data, or validate your experiment and offer new insights along the way. You’ll evolve your understanding of cause and effect as an organisation this way.

Prioritisation is hard, and sometimes you won’t even spot your misses, because our own ideas often seem a better choice than competing ideas simply because they are our ideas. We shouldn’t be surprised: Behavioural economists have demonstrated the IKEA effect [PDF]; when we construct products ourselves we overvalue them.

You’ll likely have many data sources available to you, and should use them: data is data. Evidence from other experiments, qualitative, and quantitative feedback, should challenge priority at every opportunity. Experiments fail more often than not; taking evidence from others’ winning experiments gives us an opportunity to validate it at a wider scale. Evidence from experiments should challenge our execution of experiment details too.

 

Pitfall 2: Poor Hypothesis and No Overall Evaluation Criterion 

Hypotheses are not beliefs. They are predictions. Often we can consider deploying an experiment and then trying to find the data to prove ourselves right. This is a problem. Look for anything and you’ll find something, but this bias in analysis can lead you to miss a wider detrimental impact and many false positives.​

A good place to start is by defining an Overall Evaluation Criterion (OEC), a small set of metrics agreed across your organisation. Over time the OEC can evolve. When you experiment against an OEC, most feature ideas are now simply hypotheses to improve it and we can move faster by proving ourselves wrong quickly.

Pitfall 3: Poor Understanding of Significance

Significance is widely misunderstood, even a book by the co-founder and CEO of the leading A/B testing platforms gets this wrong: ‘Once the test reaches statistical significance, you’ll have your answer’ and ‘When the test has reached a statistically significant conclusion, the tester sends a follow-up with results and the key takeaways’ are incorrect procedures that will lead to finding many more false positives than expected. The correct procedure is to determine the duration upfront, conduct power calculations to understand the traffic you need, and calculate the significance of your data when the test has run for the full duration.

Another misconception I see often is the expectation that significance is somehow neatly wrapped with the experiment to proof it. Significance is concerning itself only with the differences in numbers. We are null hypothesis significance testing (NHST) – the null is the inverse of our prediction. With NHST the starting position for most experiments is that nothing interesting is happening, and it offers a form of evidence – expressed as p-Value – that something interesting might be happening. The lower the p-Value, the more you can trust the ‘interestingness’ of your data and draw conclusions accordingly.

 

Pitfall 4: Running Inconclusive Experiments Longer

By its nature, running experiments causes waste. Many designs or features won’t survive; a lot of your work, no matter how much you want it to, just won’t land with your users. That can be hard to take but it is part of our process. We are all passionate about our products but we need to careful not to fall in love with our own ideas and focus on validating ourselves.

This can lead to running inconclusive experiments longer in the hope users will get used to things. While is it possible novelty effects could cause deltas to change significantly, this is rare in our setting. Avoid running your experiment for longer than the power you need simply because you aren’t getting the data you want.

There’s another problem with this approach. Let’s say you’re aiming to measure 1% uplift and you already have the power to demonstrate that but your experiment hasn’t, then looking for significance further only increases your chances of false positives.

Ryan Singer said it best: “You can’t improve […] when you’re emotionally attached to previous decisions. Improvements come from flexibility and openness.” We’ll modify and repeat to learn further, but we should be deferential enough to change tact if users tell us through their clicks that our ideas are off track.

Pitfall 5: Ineffective Ramp Up Strategy 

In an uncontrolled setting a feature would ship to 100% of users and take days for user feedback/data to show something bad has happened. When you run an experiment you have the ability to detect and abort these bad changes quickly. Failure is inevitable in experiments, but expensive failure is not if we can design effective ramp up procedures.

Typically, the first stage will be within a 5% range of users to minimise the blast radius and risk of realising a change. The second stage will be within a 50% range of users to measure the change, and a third stage (if your experiment is successful), will be to ramp the feature to all users (or within a 95% if a hold out group to measure impact over time).

It’s crucial your ramp up strategy doesn’t slow you down unnecessarily. An effective ramp up strategy should have clear goals against each stage, and move forward to the next stage unless results are unsatisfactory. For example: stage one may look to validate there is no significant detrimental impact while ensuring engineering and platform resources are performing reliably in less than 24 hours.

 

Pitfall 6: Method That Doesn’t Match the Expected Uplift

On one hand we can make so many changes at once that our experiment levels out, on the other we could make changes so incremental we can’t realistically measure it in a meaningful timeframe. Consider this as a hill climbing metaphor. We’re using experiments to climb, but we’re doing it blindfolded. We can’t see upfront if we’re at the top of a hill (local maximum) with a much bigger mountain further ahead (global maximum) that will need a leap to get there, or if we’re far away from the peak of this hill and we make big gains with small steps.

Experiments help us assess the terrain and design our leaps accordingly. Power calculations are our guide, for example they might tell us that if we want to learn the impact of our change in 2 weeks with the traffic we have we can only identify a meaningful lift of 5%. So we can therefore determine how we’ll assess the terrain accordingly. Example: We have 2 weeks: will this background change give us the 5% uplift we need? Experiment power is not an arbitrary decision like other parts of the process; this statistical analysis is required to design effective experiments. Dan McKinley created an intuitive online calculator that we have adopted as standard.

 

Pitfall 7: Failing Ideas Not Experiments

You need to make an important distinction. A failed experiment is not a failed idea; it is failing only the concept in its current condition. Failed experiments are not necessarily dead ends. You can learn a lot about your users’ behaviour even if they didn’t result in a positive change to the metrics.  A modification could turn your idea around. Small changes matter unless you are at a local maximum. Explore negative and neutral experiment data to help inform further iterations of the concept, or new hypotheses to be trialled. With segmentation there is a risk of false positives, so validation with new experiments is important.

If conditions change, so too could the outcome. Consider that in highly experimented products, conditions change often, and as such as you experiment your product constantly evolves. The implication of this is experiments you accept now could change what you rejected in the past. Similarly, Booking recognise some ideas could be before their time.

What’s most important is you don’t take a winning experiment implementation and make it a sacred cow never to be explored or challenged further.

 

Prove Yourself Wrong

Understanding the common pitfalls in experimentation will help you get closer to determining when your experiments failed in execution or when they failed because your intuition about the concept was wrong. If your experiment failed in execution, you can quickly iterate, taking what you learned the first time to improve your approach to better assess the terrain next time.

It’s important to constantly try to increase your iterative capital by making experiments cheaper. If you determine it’s your intuition that’s wrong, this can be humbling but should be celebrated. Through this “prove yourself wrong” culture, new discoveries can be made and innovation can be accelerated.

Want to read more? Check out ‘Design Like You’re Right, Test Like You’re Wrong’, a direct response to the pitfalls above.

Thanks to the many people across Skyscanner and externally who provided feedback for this article. In the latter, special appreciation to Ron Kohavi (Microsoft), Ya Xu (LinkedIn) and Ben Dressler (Spotify).


Sign up for email updates from the CodeVoyagers team


Introducing the Skyscanner Tech Engineering Blog

Posted on by Bryan Dove

Welcome to Skyscanner’s engineering blog. Thanks for taking the time to stop by.

It is early days, but you can see from our first few blogs that you should expect a wide range of articles, from technology insights to the theory behind product decisions to thought leadership. We’ve been inspired by the other tech company blogs that we as engineers love – Etsy and Netflix to name just two.

At Skyscanner we put emphasis on sharing ideas, both internally and externally. This blog is an extension of that, and so too are our community support initiatives and Technical Thought Leader Series– more on these below.

But back to our engineering blog. You might be asking – why open up this information, whether it be through world-class speakers or internal Skyscanner insights and learnings? It goes back to Skyscanner’s roots and our culture. We’re a team of entrepreneurs, and we want to share what we’ve learned along the way with the broader community, to support and inspire the next generation of entrepreneurs. We believe this is the way the tech community should work.

Our hope is that you find real insight and inspiration within this blog and within our community programme. We aim to be a unique voice amongst the cacophony of tech companies sharing what they are working on and how they think.

Here’s how:

  • We will be as open as possible, talking about both our successes and our failures. Internally we capture so much learning through our mistakes, and we want to disseminate this more broadly than just our organization
  • In addition to talking about technology, we will also share our successes and failures when it comes to scaling our company. There is a ton of information about how to scale your technology stack, but there is a dearth of information about how to scale your business. You can’t have one without the other.

We believe these two dimensions deliver a unique perspective.

Why might that interest to you? Well, we’re a global travel search company with over 50m MAUs across the world. We’re solving complex problems every single day – the challenges that existed in the sector when we were founded 10 years ago still exist. We’re on a technology journey; for example, we’ve a goal to transition 100% to AWS in under two years. It’s ambitious: Netflix did it in seven. We have our work cut out and we’ll regularly post updates and lessons, such as an upcoming article on how we make trade-offs between building solutions in EC2 vs. using AWS Lambda.

You’ll also see blogs about how we are setting up the dev environment to drive ideal productivity with Docker, ECS, and automated, progressive production deployments. On an organizational level, we’ll share with you how we’re managing the mind-set change where architecture and cost are now directly related. Blogs will be published at least once a week, so I urge you to keep coming back for fresh content and ideas. And of course, we’d love to hear your feedback and questions too.

However, our outreach doesn’t start and stop with this blog. Here are other ways we’ll be sharing insights and knowledge.

 

Open Source Contributions

We’ve recently opened an engineering hub in London, where we’re very excited to expand our community initiatives both locally and across the world. These include our regular Open Source contributions (check us out at Github.com/Skyscanner).

Our philosophy on Open Source is that we want to be open by default, as much as is possible. We will continue to publish on Github, and expect to accelerate our pace of publishing over the coming year. At present, we’d love your comments on Dixie, our Open Source iOS testing framework.

I mentioned our migration to AWS earlier. We expect to encounter a number of new technologies that we’ll have to create or contribute back to enable Skyscanner, as a globally and technologically diversified company, to successfully migrate 100% of its operations to AWS.

 

Community Meetups and Sponsorship

We’re keen supporters of the local engineering community surrounding our nine other offices across the world and we’ll continue this approach as we integrate into the tech community in London. This may take many forms, and includes sponsoring local meetups with a combination of funding and hosting in our office space. This blog will be the prime area to get visibility of these – we’ll give as much advance warning as possible, as we know you’re a busy lot. With that in mind, we’ll also post recordings of those talks, when appropriate, so a wider audience may access them.

Increasing diversity in technology is critical. We want to do everything we can to increase diversity across the industry. To this end, we’re actively sponsoring programmes focussed on increasing the number of women working in technology, like the brilliant RailsGirls. Our support doesn’t stop there, so if there are other organizations with similar goals looking for partners, please get in contact with us.

 

Technical Thought Leaders Series

We’re also launching, initially in London, our Technical Thought Leader Series (or the TTL Series for short). This is a series of talks focused on bringing top tier speakers to share knowledge with our active start-up communities. I hope to meet many of our neighbours and fellow community members at these events.

The start-up scene in Europe has exploded in recent years and London is one of the epicentres of this movement. The TTL Series will bring a unique set of presenters to London to share their content, for free. Topics will be diverse, from growing and scaling a start-up to technology to engineering leadership to business leadership. There’s no focus on any specific format; we’ll move between panel discussions, open Q&A and presentations as appropriate.

We’ll also record and publish the TTL sessions here to ensure that this knowledge is preserved and is available to the community outside across the world.

Our first is on November 23rd, where Sir Michael Moritz, chairman of Sequoia Capital and board member at Skyscanner, will speak.

Thank you for taking the time to read this, and I hope that you’re as excited as we are at our forthcoming activities. This is the first of many blogs that will provide insight into what we’re doing and why. The Skyscanner engineering team hope that by providing a unique level of transparency we are adding value to the community and that others can learn from our successes and failures to help accelerate their own journey.

We are always looking for feedback on ways to improve and content to include. Please reach out at codevoyagersblog@skyscanner.com.

 

-Bryan

 


Sign up for email updates from the CodeVoyagers team


Dixie: turning chaos to your advantage

Posted on by Balint Orosz

What do you do, if:
… your app crashes with unreliable networks?
… you want to make sure your app can withstand edge-cases?

We’ve got a simple formula: create a chaos generator to simulate worst-case scenarios, attempt to break your app and try to cause it to fail. And if you’re successful? Congratulations: modify your code, increase your app’s fault tolerance, and repeat.

We call it Dixie. It’s an open-source project to help developers find an effective answer to stability issues and worst-case scenarios, and we’ve shared it on GitHub.

Interested? Here’s how Dixie came to life.

dixie small

The Problem

We all know that today’s development teams have to create increasingly complex software in reduced time-frames. It’s no different here at Skyscanner, where we create mobile apps across multiple platforms. We, like many of you, believe that being able to react immediately to constantly changing requirements without sacrificing the perfect user experience is a key element in the product development cycle.

As our mobile app team continues to expand, it’s ever-more difficult for one developer to understand the entire codebase and visualise the impact a new modification could cause. With this comes the risk of unexpected side effects and mysterious crashes. We recognized that what we really needed to do was build apps that can handle any unexpected situation, even those developers don’t tend to expect during the initial creation process.

We came up with a few possible solutions. A high level of unit test coverage (approaching 100%) on a code base wasn’t a bad shout, but with a side-effect of greatly reduced reaction time, there was too much potential damage to the development cycle (plus, 100% code coverage can, in some cases, be almost impossible to achieve). We also considered identifying the most critical parts of the code and testing these incredibly thoroughly, which has the up-side of adding huge value to development efforts. However, the final solution came from Peter Adam Wiesner (lead iOS developer in Skyscanner) who was inspired by an article about something called Chaos Monkey, created by Netflix backend developers.

Not familiar with Chaos Monkey?

Chaos Monkey is a service which identifies groups of systems and randomly terminates one of the systems in a group.

Basically, Chaos Monkey allows developers to attack their own system to an extent which helps highlight potential weaknesses. Knowing about, and reacting to, these weaknesses helps increase long-term quality and builds confidence in the stability of the system.

Dixie: the Solution

We thought a tool like this could be just as useful for our existing projects. However, rather than a system of servers, our tool targets code components and modifies their behaviour.

Consider first that an application written in an object-oriented language can be visualized as similar to a network of servers communicating with each other. Just as a server can go down, a component of code can also start behaving incorrectly, which can then affect all the other components which are reliant upon it. This component could have all kinds of responsibilities, such as handling network communication, providing location information, managing user data or loading files from the file system.

A generally acceptable result in this scenario would be for the system to degrade gracefully, and recover from the error while minimizing the amount of harm to user experience. Ideally, the system would not continue to propagate errors, and should certainly not crash completely.

This is where Dixie comes in. Like Chaos Monkey, it can be thought of as a chaos generator, which can cause specified components to function differently and help simulate worst-case scenarios. A developer using this tool can deliberately attempt to break the app and cause it to fail. If they are successful, and the app does not handle the breakage gracefully, then it is a clear sign to the developer that the code requires modifications to increase its fault tolerance.

The idea of changing an objects behaviour is not new; developers are already using mocking libraries in unit tests. These libraries help to gain control over the dependencies of the tested components. Most of the libraries focusing on mocking instances, therefore they require the target component to use its dependencies as injected objects (ie: provide interface where they can be set or be ready to be used with IoC libraries). A well-designed architecture supports all of these above, although the testing of application with higher complexity can still be a problem. Writing higher abstractions of unit tests (integration, systems), requires more and more work to assemble the correct environment.

Instead, Dixie takes a different approach, by allowing changes to the behaviour of interior components. By applying some chaos in the method of some objects, the program flow can be changed to present (edge) cases and allows for them to be distributed across multiple components, testing their robustness. For a concrete implementation we choose the Objective-C language, where replacing behaviours is easier due to its runtime. Instead of using NSProxy object (which would also require injectable dependencies), we choose to work with the technique of method swizzling.

Method swizzling is based on calling the correct run time APIs to replace a method’s implementation with a new implementation. Working with this API requires the developer to be familiar with low-level C methods, object representations and provide the correct method environment information. Dixie takes care all of these and hides the low level logics, so the developers can focus on creating new configurations.

The developer can specify the objects and the methods that should be changed and chooses how they wish them to be changed. This creates the ‘profile’. This profile can then be applied in a single line of code, which will cause Dixie to rewire the internal structure of the application. Changes can be reverted at any time, which gives developers a control over how and where they choose to apply Dixie.

The possibilities are limitless; Dixie allows you to create your own tools, from the simplest custom profiles to complex patterns of behaviours. We’ve created an example app to demonstrate how easily a developer might implement chaos:
• Altering the GPS coordinates returned by the operating system (Location example app)
• Altering dates or times returned by the operating system (Date example app)
• Changing network response (Network example app)

Or, why not use Dixie to:

• Replace localization strings to test label truncations without polluting production code
• Simulate properties in your data objects
• Change operation system informations like battery level and device hardware information

What now?

The first version of Dixie was implemented back in October 2014, with a second version released this summer by Skyscanner’s Budapest team (Peter Adam Wiesner, Zsolt Varnai, Phillip Wheatley, Tamas Flamich, Zsombor Fuszenecker, Csaba Szabo). What’s different? Well, we’ve cut unstable proof of concept parts from the codebase, in addition to going through every source file to refactor and clean them, making them more usable for the community.

As we focused on the essentials, this means that Dixie currently only supports replacement of methods that expect objects as parameters and either return object or void. In the future, we want to add support for primitive types too. There is plenty to implement both horizontally (extending the current tool) and vertically (implementing new tools) in the long term.

Here’s what might be next:
• implementing a unit test tool that can do fuzzing on the specific method of a class (or all), detecting input assertions and creating unit test for the failed cases
• undertaking code analysis to find weak spots in method and automatically suggest behaviour changes
• detecting application dependency graph runtime and using this information to create more efficient chaos

We hope that Dixie can help in solving complex issues in a much more productive and effective way. You can find Dixie here— let us know what you think in the comments below, and get involved over at GitHub.


Sign up for email updates from the CodeVoyagers team


The bots are coming:​ Conversational UI and introducing the Skyscanner Telegram Bot

Posted on by Richard Keen

What’s the theme running across Siri​, Cortana, Alexa/Echo, Slackbot, Native, Operator and Facebook’s M? With varying degrees of success these are all attempts to introduce “conversational user interfaces” as a new and distinct interaction method, primarily but not exclusively on mobile devices.

Some of these services are purely algorithmic whilst others are human powered (at least for now). Most, whether algorithmic or not, attempt to fulfil the role of personal assistant. Conversational interactions are familiar to all users from messaging, and of course in APAC there has been a trend of messaging apps such as WeChat developing in to platforms with rich and diverse functionality, such as booking a restaurant table or a cinema ticket.

oas@2x
(image from Dan Grover’s blog)

The esteemed mobile analyst Benedict Evans believes messaging is the new app platform: “[In WeChat, in China] You can send money, order a cab, book a restaurant or track and manage an ecommerce order, all within one social app. So, like the web, you don’t need to install new apps to access these services, but, unlike the web, they can also use push and messaging and social to spread.”

Old: all software expands until it includes messaging New: all messaging expands until it includes software
— Benedict Evans (@BenedictEvans) March 13, 2015

Similarly Nir Eyal recently wrote about his experience with conversational assistants and why ‘Assistant-As-App’ Might Be the Next Big Tech Trend.

Looking back there are of course precursors to these conversational interfaces: IRC bots and the much-maligned Office Clippy of the 90s, SMS text message keyword response services in the early 2000s and a fad for Twitter bots around 2007. So what’s changed since the days of Clippy and texting a code to get Crazy Frog?

I believe a confluence of technological progress and user behaviour changes point toward messaging and conversational UI being the next big wave:

  • Our mobile devices can infer and learn significant amounts of contextual information from platform APIs and sensors (location, activity, calendar, even mood).
  • Voice input has matured and is widely accepted and used.
  • Natural language parsing, particularly term extraction, has made significant leaps.
  • Messaging, both person to person and within groups is the defacto mobile experience.
  • Sharing of rich content and collaboration on tasks (for example researching and booking a holiday together) still has a high degree of friction. Messaging conversations feel like the right place to solve this.
  • Nuggets of content in the form of “cards” are ubiquitous and fit naturally in conversational responses.
  • Users are increasingly adept at tuning their “natural” input when they realise they are “talking to a machine”.

There are however significant and interesting challenges in how one handles user input and presents useful information to users in a conversational form.

Introducing a Telegram Bot for Skyscanner

As a first step to help explore and understand the conversational medium, I have hacked together a Skyscanner “bot” for the Telegram messaging service. The bot currently provides a rudimentary hotels search based on our B2B API.

Telegram has over 60 million monthly active users on their platform and has been growing rapidly in recent months. They recently introduced a bots API enabling developers to create conversational UIs for both one-to-one and group chats. The API has some interesting and innovative capabilities such as enabling custom response keyboards for specific conversation states (for example ‘Yes’ and ‘No’ buttons).

Here’s an example conversation with the Skyscanner bot. You can view a slower non-GIF version here.

Skyscanner Bot

Whilst the functionality and usefulness of the bot are very limited at this stage, it’s interesting to consider a few advantages messaging provides: extremely low data usage, nothing to install (assuming you use a supported messaging service), excellent poor network connectivity handing, cross-platform by default, push notification of updates and group communication.

What can the bot do?
​While far from all-encompassing, the bot can:

  • Search for available hotels at a particular location, on a specific day
  • Start a search with /hotels (a slash indicates an action to Telegram bots)
  • Allows you to change and refine your search by providing a new location or date after the initial results are sent
  • Understands some natural language date entry forms in addition to full dates, for example ‘Next weekend’, ‘Next Friday’, ‘September’ or ‘January 1st 2016’
  • Accepts feedback with /feedback

How to try the bot and give feedback

You can give the bot a try by following this link​ or starting a Telegram conversation with the Skyscanner_Bot user (the one with a Skyscanner icon).

Telegram is available for iOS, Android, Windows Phone, Mac, Windows, Linux and Web at: https://telegram.org.

Please note, all messages you send to the bot will be logged and stored so that we can learn from people’s usage.

Feedback is very much welcomed and appreciated: the bot accepts feedback with the /feedback command.​


Sign up for email updates from the CodeVoyagers team