Implementing an autosuggest service

Posted on by Joaquin Perez

We recently hit a milestone: serving the whole set of desktop traffic for flights autosuggest. This new service was added to the previously running autosuggest for Hotels and Car Hire.

Before we created this service, we knew that one of our main objectives as a squad would be to establish a unique model/architecture for the autosuggest service that could be applied to all of the Skyscanner verticals. Taking ownership of autosuggest for flights implied many challenges and new data. It also required a growth in infrastructure and the ability to support 15 times the current requests per second.

Here’s how we did it.

Autosuggest: reliability, stability, predictability

Autosuggest is in many cases the first entry point to our services. Normally the first thing a user does is search for a place to fly, stay or maybe rent a car.  A service like this must include at least the following features: reliability, stability, and predictability. These three properties should help the user to find what s/he is looking for, or at least suggest to them the most likely results based on their  input.

Building our autosuggest service includes at least three main different stages. Firstly, we need to generate the entities (cities, airports, hotels, streets…)  and the links between them. Secondly, we have to store this data in a way that will allow us to search and retrieve the results using partial, prefixed, or fuzzy matches. Finally we create an endpoint to query this data. We assign a value of relevance to all of our entities based on the user query and add some other magic to supply the user with the best possible subset of results for a specific query.
How we obtain or generate this data is out of scope of this post, but we are  currently looking to index all the world’s cities, airport or streets, including synonyms for some entities and translations for the different supported locales.

Next, I’ll try to explain the rest of the process based on the current deployed architecture.

Architecture

The autosuggest stack is based mainly in Jetty, Solr and a lot of Java code, besides which we use an in-house key value mapstore called Kraken, which is written in C++. The first entry point to our service is a Jetty-based handler that we call ‘distributor’. This endpoint currently supports more than 200 requests per second and is able to scale horizontally. This service is responsible for the entire logic happening when a user submits a query, until a Json, CSV or XML response is returned including the ranking of entities more suitable for their specific query.

Joa1

This logic includes submitting the request to our repositories of entities (Solrs), combining the obtained results, filtering, reordering the results, prettifying these results, and finally returning them in the requested output format.

All the entities are indexed using Solr (http://lucene.apache.org/solr/), which is a framework specifically indicated for text retrieval. In order to give our users the most accurate suggestions for their queries, we index our data in a way that allows us to find partial matches. In addition, we support different names for the same unique entity, as for example happens with New York or NYC. Supporting different literals to identify the same entity allow us to increase our coverage capacity, which from the user point of view implies more clever suggestions for different inputs.

We create different indexes based on the pair language/entity_type, which means we have different indexes for cities, hotels, streets, regions, and points of interest (airports, monuments, etc.) for each one of the supported locales. In summary, as we currently support 31 different locales with five different types of entities, we’ve built 31 * five indexes. That’s 155 indexes, which as you have probably realised already, means a lot of dedicated infrastructure for this service.

Obviously performance is a must in an autosuggest service; there is nothing worse from a user-experience perspective than a service of this type with a lot of lagging. For instance, our requirement for the flights vertical is replying in less than 20ms (currently as an average we are doing it in less than 10ms). This means the first requirement is putting everything in memory; we don’t want to search in any secondary storage, a consequence of which is that we must keep indexes as small as possible to be sure they fit completely in RAM. In order to achieve this we need to use an external key map store. This alternative storage contains all the necessary data to build a response understandable by the user, but which it is not required to be indexed.

The Kraken service, as mentioned before, is our in-house key value storage, developed and maintained by us. This storage service guarantees constant performance – on average, Kraken is able to return the related values for a key under 4ms when requested from the same datacentre.

In order to increase the availability and fault-tolerance of our services, all this architecture is duplicated in each datacentre and replicated across all datacenters.

Learning to Fly

Moving our Flights autosuggest from the old to the new version implied a significant number of changes in our side, which I will try to explain below.

The first step was to import all the necessary literals from the flights database; this data included the airports names, IATA codes and in addition the cities where these airports are located. All these literals can be found in the different  supported locales by Skyscanner. Once we acquired these literals we indexed them, allowing a type of search similar to the one presented in the Hotels or Car Hire services.  In addition to importing the literals, we included the ranking of each city and airports of the country when available.  This ranking applies an order to some of the entities for each market by popularity, which means that for instance for the UK market the most popular search term is ‘United Kingdom’, followed by the city of London.

The second step was to mimic the old autosuggest endpoint API. This allowed us to send the same request to both versions, simply by changing the parameter version and therefore we could support exactly the same syntax and semantic as the former autosuggest service.

At this point we were able to deploy an alpha version. After this deployment we were in the position of measuring the quality of our system. To measure how good the responses of this new version were, we launched 100 queries for each locale and compared the results obtained by both versions to check if the results were good enough for this new implementation. Finally, after some iterations improving the quality of the data and adjusting the scoring system, we achieved a system which satisfied the established criteria.

Conclusion

After a lot of work we are happy to say that the objective was achieved; our autosuggest is in place for the three main Skyscanner verticals, without affecting conversion rate measures for flights, which was the main purpose before starting this task.

Now, we are in the position of increasing the quality of this service, trying to give the best possible suggestion to any user input. It’s worth emphasising the magnitude of the challenge we tackled, and it is testament to these improvements that we’ve seen a real increase in the number and speed of autosuggest requests since.


Sign up for email updates from the CodeVoyagers team


Transitioning From Objective C to Swift in 4 Steps – Without Rewriting The Existing Code

Posted on by Gergely Orosz

[Note: as a follow-up post, for more details on how we made this transition, please see: How We Migrated Our Objective C Projects to Swift – Step By Step]

We started developing Skyscanner TravelPro in Objective C in March 2015. A couple of months later, when Swift 2.0 was released, we started to slowly introduce Swift. Fast forward to 8 months later – and 100% of the new code we write is in Swift. All without having rewritten any of our existing, working, robust and tested Objective C code – there would have been little point in doing so.

There are many resources talking about how to decide whether to use Swift for a new project or not, and best practices for writing Swift. However if you’re working on a pretty large Objective C codebase, you will probably find this article useful. If not – one day you might bump into a codebase where you want to start using Swift: this article presents some advice on how to get started doing so.

Here’s a visual representation of how our codebase has changed in 10 months. Since November 2015 all new code is written in Swift, which now makes up about 10% of our 65,000 line codebase – and growing.

Skyscanner TravelPro Objective C vs Swift code lines of code over 10 months

So what was our approach when going from Objective C to Swift?
Continue reading Transitioning From Objective C to Swift in 4 Steps – Without Rewriting The Existing Code


Sign up for email updates from the CodeVoyagers team


A Journey in React with the Car Hire Frontend Squad

Posted on by Graham Martin

The airport transfers product delivered and pushed live in the UK market by Skyscanner’s car hire tribe in mid-2015 was very much built as an MVP. The frontend architecture was ‘borrowed’ heavily from Skyscanner’s car hire product, with the result that we had large chunks of unused, untested code. When the MVP showed promise, and the car hire frontend squad were given the green light to re-architect the client application to make it more stable and maintainable, we began to think about how we’d put it together.

Our main aims for rewrite were to:

  • Keep the view/presentation layer simple to make for easier unit testing. As part of this, look at creating reusable presentational components that could be shared with car hire.
  • Improve the rendering performance, as the html for the results list. Each deal (implemented as Backbone views) was scrapped and regenerated on receiving new quotes from the API, or navigating to another page. This made for a very jumpy user experience.
  • Keep in mind that the architecture and patterns used would likely be applied to the more complex car hire product in future.

We took some time to look at our requirements and see whether any frameworks would help with their delivery. While the scope of our work was a reasonable size, we weren’t rewriting the whole application. Instead we were focusing on the view layer (the V of MVC). Our logic and state were simple and there was no complex client-side navigation required. Angular, Ember and Backbone all seemed too heavyweight and opinionated for our needs. React, however, seemed to fit pretty well as the following snippets from the React homepage demonstrate:

Lots of people use React as the V in MVC. Since React makes no assumptions about the rest of your technology stack, it’s easy to try it out on a small feature in an existing project.

React implements one-way reactive data flow which reduces boilerplate and is easier to reason about than traditional data binding.

React’s primary attraction was it’s rendering performance. When a state change occurs, React will try to re-render the entire component subtree (more on component nesting later). The natural initial reaction to this is that it would degrade performance. But in re-rendering, React constructs a new virtual DOM – an abstract non-browser-specific version of the DOM – and uses some clever diffing techniques to calculate whether an actual DOM update is necessary. The DOM is then modified with the minimum number of changes possible. This efficient rendering was extremely appealing for our results page, where only selected areas of the page content are updated on processing pricing service poll results.

So, we made the decision to give it a try.

Experimenting with React

The first thing we had to get our head around was the JSX syntax. While it is not mandatory for writing React components (simple javascript can be used), JSX is recommended. This is because of its concise XML-like structure that is more readable when defining large component trees. React components and standard HTML tags can be mixed in JSX. For example, the following snippet creates a resultsContainer fragment from a mix of lower-case HTML tags and React components (ResultCount and Pagination) that begin with an upper case letter.

The attributes that are set on the React components are known as props and pass data to the component to be used in the rendering logic. A fundamental rule in defining React components is that a component cannot mutate its props. This is to guarantee consistency in UIs. A component can define a contract on its props by defining PropTypes. These validate the props that are passed on creation of the component instance and warn on any validation failures. There are lots of different types of out of the box PropTypes, including string, bool, function, object, array and all of these can be marked as required. When your validation requirements are not covered by the core types, custom PropTypes can be wired up simply by writing a function.

The mixing of HTML and javascript in JSX takes a bit of getting used to – after all we’ve been taught to separate our presentation from our functionality for years – but the packaging of the logic along with the markup that it affects in a self-contained component actually makes a lot of sense. The components are easily understandable, the responsibilities are clear and the tracing of complex flows from javascript -> HTML and back is simple. We found that once we overcame the initial culture shock, working with JSX was very satisfying.

React components can be nested in a tree structure. We can compose complex UIs from multiple small, well-defined, reusable components using this structure. When defining a hierarchy of components we end up with components that own other components – that is components that create instance of other components in their render() method – and that are responsible for setting the props of their owned components. This one-way data flow down the component hierarchy is what triggers the re-rendering of the component tree and is key to React’s simplicity.  We found that we iterated on component design, refactoring components by moving certain presentation and logic into new, small components.

State Management

State and props are closely related but are distinct concepts. Most components will be stateless and will simply render data received through props. Others will respond to user input or handle server responses – these require state. Identifying where the state should be manipulated was challenging.  We found that we were dealing with state in too many places and some simple components contained overly complex logic. We needed a better way of isolating our global state and supporting the one-way data flow. What we were missing from the jigsaw was the Flux architecture.

The Flux architecture isn’t a 1960’s spy film starring Michael Caine, it’s a pattern used by React’s creator Facebook to manage data flow in its web applications. The data flow looks like this (taken from the Flux architecture docs).

flux-simple-f8-diagram-1300w

User interaction with a view causes an action to be propagated to the data store(s) by a central dispatcher component and the resulting state change(s) are reflected in any affected views. This is similar to MVC but the data flow is unidirectional.

Redux and React-redux are two complementary libraries that provide the most popular concrete implementation of this pattern in the React space. Redux implements the concept of a single store which contains the whole state of your application, as well as the dispatcher functions (called reducers) which return the resultant new state.  The immutability of the state is key to predictability of the application – the reducer is a pure function that must return a new state object rather than mutating the current state.  React-redux provides the bindings and plumbing code required to expose the store to all components in the hierarchy and encourages you to think of components as either container or presentational components.  Only container components interact with the store – invoking actions and receiving new state – and then pass the state down the hierarchy to presentational components via props.  Testing of presentational components is simple as we are only concerned with the output of the render() method while state changes are easily testable as they are decoupled from React.

Unit Testing React

The common approach to unit testing React components was, until quite recently, to render the components into a DOM (using something like jsdom) and assert against them using React’s TestUtils.  However, in React v0.13, Shallow rendering was introduced.  This feature does not require a DOM and effectively isolates the rendering to the component under test as it renders the component only one level deep.  Child components are not instantiated so test setup is simplified.  The output to the shallow rendering of the components is an object that is unfortunately quite difficult to traverse.  To simplify this, we used the excellent skin-deep library, which provides you with lots of useful methods for digging through the properties.  You can see from the following example how clean the tests are:

What have we learned?

So, we’ve learned an awful lot about React in the time we’ve been iterating on our implementation. What we’ve ended up with in our airport transfers client application is a nicely architected solution with small, composable components that have a clear responsibility. The data flow in the application is simple and state transformations are predictable and easy to test. Of course, you can achieve these results without React or Redux but these libraries force you into thinking about your application in a way that encourages this, as well as handling the plumbing and setup for you.

There is an overhead to using React. As well as the learning curve (which hopefully will be less steep for you after you’ve read this :-)), the package(s) add around 40KB to your gzipped javascript bundle size. You may have noticed that I’ve not mentioned rendering performance improvements here, and that is because our results were inconclusive. We are still experimenting with the best way to measure this accurately but rudimentary timings from switching pages have yielded insignificant differences.

So, where do we go next? Well, improving the profiling of our application is our first goal so we can accurately measure our React timings. As I mentioned at the start of this article, we are looking to redesign our car hire results page in the near future and this will effectively require a rewrite of the client application. Our test coverage is low in some key areas so we’ll compose the page of small well-defined React components that are easily testable. We’ll move all of our state to a global store and describe state transformation in actions and implement them in a reducer function, and increase our test coverage there too. We’ll A/B test the new implementation (which we didn’t do in airport transfers due to the small level of traffic the product initially received) to ensure we’ve not degraded the user experience, and we’ll iterate. Look out for a future blog post to see how we got on.


Sign up for email updates from the CodeVoyagers team


A polyglot architecture – Skyscanner’s frontend under the hood

Posted on by Alex Bardas

A few months ago I had the pleasure – if I am to ignore the random weather– to travel to London and attend a recruitment event called Silicon Milkroundabout with some of my colleagues from Skyscanner. The most frequent question I was asked was “What programming language do you use?” or “What’s Skyscanner written in?”

It’s an interesting question, because it’s hard to give a straight answer to it. Not because of some weird non-disclosure agreement (we’re pretty open about the technologies we use and we open source a lot of our projects – check out our github), but because we use so many of them. Here’s a look at how and why.

Tribes, squads and different programming languages

Internally, Skyscanner is ‘squadified’, an organizational model described here, which actually means we’re organized in tribes – yes we called them tribes and each has a tribe leader, although I’m not sure that’s actually written on their business cards . Each tribe is divided into squads.

Squads are multidisciplinary teams that own a service and have a very high degree of autonomy, the main – and sometimes only – constraint being that they must respect contracts with other squads. Think of each squad as a mini-startup. Under this paradigm, squad members can pick their release cycle, their agile methodology – scrum, Kanban – and of course, their technology stack. Which makes things really interesting and very diverse.

At Skyscanner we have  Java, Python, .NET, PHP, Ruby, nodeJS squads and the list can go on. We believe that the problem choses the technology and not the other way around, so I can say we’re actually polyglots when it comes to programming languages. Which is really cool, but also really hard to explain at a recruitment event.

The Hotels Vertical

I work in the Hotels tribe, the frontend squad, but other verticals share a similar architectural approach. Compiling a list of hotels and displaying it to the end user, with up to date prices from multiple global partners, without duplicates and with relevant images goes well beyond issuing a “SELECT * FROM hotels WHERE price BETWEEN …” DB query. It’s a fine-tuned process that includes multiple teams, including:

• Partner engineering squad – they liaise with our partners and are responsible with retrieving information from them, such as hotel prices. Stack: mostly Python.

• Data squad – this squad creates the so called “hotel data packages” which are used to display information from the users, making sure the information is consistent and without duplicates.
Jacek’s One picture is worth a thousand words. So, how does it scale to a million pictures? provides a more in depth view on what they do. Stack: Python.

• Geo squad – this squad maintains the Travel Knowledge Graph, a database system that represents the world as an ontology. This database can be queried directly using a language called DQL (Distributed Query Language). Stack: Python and a modified version of postgreSQL.

• Search services squad – tasked with providing the best autocomplete results to the user, their algorithms try to guess the user’s intended destination even when there are typos involved. More on the subject in Ben’s Measuring Autosuggest Quality post. Stack: Java, Solr and Lucene.

• Backend squad – the backend communicates with all the squads described above and compiles a list of results that is made available to the frontend via a RESTful API. Stack: Python.

• Frontend squad – my squad. We’re tasked with creating and maintaining the ESI components used to display the hotels pages. Stack:  PHP, CSS & various Javascript libraries.

• Web Applications squad – they own the Scaffolding component described in detail below and some elements that are common across all Skyscanner’s pages, such as headers & footers. Stack: .NET / C#.

• INTLOC squad – our Internationalization & Localization squad. Their service allows all other squads to deliver a localized native experience to our global pool of users. Stack: .NET / C#

 

Edge Side Includes in the frontend

The Skyscanner frontend is not a ‘site’ in the traditional sense of the word, with HTML code being generated by a single server-side technology and served to the user by Apache or nginx, but rather a collection of various ESI components developed and managed by different squads. The end result is that depending on where one might click on the page, the underlying HTML was generated by a different server-side technology owned by a different squad, located in a different part of the world.edgeincludes

 

So what exactly is an ESI component in our case? Well, it’s a self contained entity that renders, styles and provides JS interaction to a piece of HTML. Each component has a unique URL – ex: /hotels/search-box – and several endpoints, each with its own responsibility, as shown in the image below.

tudor1

The endpoint’s name is appended to the component’s URL to create the endpoint’s URL. So for example, if I want to render the script tag for the hotels search box somewhere in the footer, I would issue a request to /hotels/search-box/script.
In the current architecture, each public URL has a template mapped to it, with placeholders for ESI components. This template is pre-processed by a component called Scaffolding and sent to Varnish which in turn requests all the ESI endpoints, applies caching rules and sends the end result back to the user. Given that ESI URLs are internal and dispatched by Varnish, components cannot access directly information coming from the client, such as query string parameters or cookies and this information is being requested via a special endpoint called requirements and injected by Scaffolding during pre-processing.
Here’s a simplified diagram that shows how a HTTP request is being handled.
tudor2

In a nutshell…

Hundreds of engineers working with different technologies, in different geographical areas, are releasing independently of each other at different times components that are assembled on the fly to render the site. Seems like a giant puzzle. And every single time I describe our architecture and way of working, I get the following question: “Does it actually work?” Yes! Amazingly well, and for 50 million users every month.

How we got here

Things were not always like this for the Hotels Frontend Squad and if you want to learn about our journey, have a look at my presentation ‘Skyscanner Journey – From code jungle to state of the art’ given at the PHP Barcelona Conference in 2015.

For more on being a polyglot, see Richard Lennox’s post on being a ‘Polyglot Technologist’ here.


Sign up for email updates from the CodeVoyagers team


Buckets of Onboarding: saving effort and money with AWS S3

Posted on by Pim Van Oerle

Traditionally we engineers tend to think in servers – need to serve up some new web content? Spin up a bunch of webservers to serve that content with a nice load balancer over them, set up a deployment pipeline to get the software out and you’re flying.

Or are you?

When we first started our experiments with Onboarding Pop-ups, newsletter signup boxes and other awesome onboarding things, we designed the servers in the normal fashion. Some Linux instances in Amazon EC2, a very simple Nginx/Flask/Python server and a lot of static files.

Quite soon we found that we did not really use the service part at all, but were really just serving static files. That left us with what basically is a a set of nice, load balanced and redundantly running servers serving up static content. Which – on reflection – wasn’t the brightest way of doing things. Here we were, using complex servers running on a system that was built around having cheap, scalable and stable static file serving – only to re-invent that file-serving in a more clunky, less scalable and more expensive way.

We decided to take action and completely kill our nice new shiny servers – instead deciding to simply deploy everything we do that is not explicitly a Service to a bucket in S3.

So here we are now – serving all of our Onboarding Code from a bunch of buckets in the cloud!

bucket

So what are the benefits?

There’s two big benefits to doing this.

1. Cost

From a service cost perspective using S3 is much much cheaper – it is built explicitly to store and serve files that don’t change much and this is exactly what we are using it for.

Having a set of EC2 instances sitting there doing frankly not very much, on the other hand, is quite a bit more expensive. Everything is still pretty fresh so we can only guess the cost based on calculators, but from a quick bash at the AWS cost calculator we are cutting cost by at least a factor 10.

2. Ease of Scaling, Maintenance and Deployment

From the point of view of the squad, we have simplified our DevOps load quite a bit by doing this. S3 takes care of scaling for us, we just upload files. Deployment is much simpler as we have eliminated any actual servers, server configuration and all the wiggly bitts that tend to go with that.

How does it work?

The system is pretty simple – we have a few elements:

• One web-enabled, versioning-enabled Bucket that the various Onboarding Pipelines deploy their code into.

• A Route53 DNS entry over that to ensure a fixed, simple address in our skyscnr.com AWS domain.

• And finally an Akamai route to the whole thing so that we can serve our files from www.skyscanner.net/svcs/onboarding/* – looking much better and also minimizing cross-site issues.

That’s it. To deploy, we prepare all our static files in Teamcity with some clever Grunt steps, generating a static file per locale for localisation, minifying where needed and bundling up the various files for easy deployment. Once that is done we simply copy to the bucket and we’re done.

Next Challenges

Of course we’re not there yet – we have only taken the first steps towards a proper bucket-based deployment system. Below are some of the next challenges that we’re working on.

S3 and Edge-side Includes

We use ESI to assemble most of our website in a smart, easy and cachable way. To be able to fit into that system we will have to find a way to conform to some of the expectations that our ESI system has – and that do not seem to quite fit with a system that can only serve static files. Do we build a very simple Lambda service to deal with this? Do we investigate what Akamai (inventors of ESI, after all) can do for us there? There are plenty of avenues to try out, and it’ll be a really interesting question to figure out.

Blue-Green Deployment

Blue-Green Deployment is awesome! There are a few really cool systems currently in development around the business to do this with services and Elastic Load Balancers in AWS – but how do we do this with just a bunch of files in S3?

Again, we’ll use the functionality AWS provides. The buckets can version their own contents, and provide easy rollback through the API. We can use that to give us a binary form of Blue-Green deployment – roll out the new version and monitor performance of our key metrics (in Mixpanel or via our own internal logging system, Grappler). If the performance of the new version falls outside set bounds, roll-back to the previous version can happen automatically.

Marrying S3 Buckets with our CD Environments

Most of Skyscanner runs with four different environment for continuous deployment – Int-CI, Int-Review, Pre-Production and finally Production. That’s great for continuous deployment and to guarantee thorough quality assurance spots while keeping the environments available for everyone to test.

But how does that work when serving code from a bucket in AWS? If we crack Blue-Green deploy we can deploy in two steps instead – running all our tests in a simple test environment and then just deploying to Production, rolling back to the Blue line if any of the service or business metrics show issues.

That quickly leads to this question – if we can do that, how do we marry that to the four different environments Skyscanner has in a smart way? We could just have four copies of the same file, but that feels like waste. There must be a better way that’ll allow both use cases – another thing to figure out over the next weeks.

 


Sign up for email updates from the CodeVoyagers team


Measuring Autosuggest Quality

Posted on by Ben Torfs

Greetings Code Voyagers, from the Free Text Search squad.

We power most of the Skyscanner auto-suggest search boxes, such as the ones where you select an airport where you want to fly to, or a city in which you need to find a hotel. More generally though, you could say that our mission is to map user input to the user intention, using as few keystrokes as possible.

autosuggest

Autosuggest: speed, relevancy and the ‘zero-result rate’

These search results need to appear very fast (less than 200ms, preferably), but above all, they need to be relevant. This especially true in the mobile market, where character typing can be a bit of a hassle and screen real estate is too scarce to display long lists of results.

Our current service is working well, and we are proud of the speed and accuracy of our results (even when the user includes some challenging typos). As always though, there is room for improvement, particularly in markets using non-Latin scripts. Measuring the quality of our service is tremendously important in identifying areas of improvement as well as enabling better A/B testing in the future.

Today, the most important metric we use is the rate of queries returning no results at all (the ‘zero-result rate’). At first it seems like an overly simplistic metric, but it can actually be quite useful to compare the performance between different locales, and how they evolve over time.

For instance, let’s take a look at this measure for the past six months in the UK, our longest supported market, where we’ve spent a lot of time optimizing the site. Our results are very strong, yet there are still  very small amounts of user-made typing errors that we cannot recover from – for example, a user may be searching for a location that doesn’t have an airport, or attempt to search for a flight to ‘Frankfart’ rather than ‘Frankfurt’ (always amusing).

Auto-suggest and non-Latin scripts

It’s not quite so easy when optimizing for newer markets or Skyscanner, where non-Latin script is used. There are some great tools out there that have really helped us make fantastic improvements; in Japan, we’ve used the wonderful Kuromoji library to convert these queries into the various Japanese character types. We’ve made similar enhancements for other languages such as Korean, which again has resulted in real progress.

Alternative auto-suggest KPIs

The zero-result rate provides us with a good idea of where to steer our efforts, but it is pretty coarse and we are looking for new and better KPIs. Here are some of the ideas we came up with:
• How many characters did the user have to type before s/he was able click on the result s/he was looking for? This metric has a direct relationship to the usability of the site. We could also count every backspace character as well, since those give an indication that we are not sufficiently resilient to typing errors.

•Whenever a result is selected, what was its position in the suggestion list? We should aim to have the clicked result to always be the first one. Today, the search ranking is already dependent on the selected market. For instance, a user who searches for ‘san’ in the USA will be returned results such as San Francisco and San Diego first. The same query typed in Spain however, will produce higher rankings for Santander and San Sebastián. Other improvements might include storing an individual’s search history and providing easier access to the queries that a user types most often.

• How many users started typing a query, but never actually selected a result (the ‘abandonment rate’)? In this case it is not only important to know how often that happens, but also why it happened. It might indicate that a street name was changed somewhere, and needs to be updated in our database.

Surely this list is not complete. Do you have thoughts on this, or other ideas on how to measure and improve our auto-suggest results? Please let us know in the comments, because we would love to hear them.

 


Sign up for email updates from the CodeVoyagers team


One picture is worth a thousand words. So, how does it scale to a million pictures?

Posted on by Jacek Wojdel

Well…it probably depends on whether they are all the same or not.

We always knew we wanted our hotel product to be very visual. Booking a hotel isn’t the same as booking a flight; photography really helps bring the hotel experience to life, which is why, on average, when a traveller looks at a hotel on the Skyscanner site, they’ll see around a dozen photos to help them make a decision on where to stay.
However, collecting these images is another matter. Every time we present a piece of information on our webpage, it is in fact a consolidated view derived from tens of different sources. We partner with over a hundred providers, and each of them, for each hotel, will give us the hotel’s details (name, street address, type of accommodation, rating etc). It’s then the Hotels Data team’s job to decide which data to use to present it in the best way to our users. The automated process of doing so this is what we call ‘Data Release’, so in essence:

release

If you just thought ‘deduplication’, or ‘entity resolution’, you’re on a right track. An integral part of the data provided to us are the images of the hotels. Our team is tasked with downloading all of them (literally millions) from our partners and figuring out which ones to present on our webpage. Again, this happens all automatically, in the ‘Image Release’ process.

release2

About a year ago, this process was running on one of our data-centres, took about three days, and could be initiated roughly once a month. Since then, we moved to Cloud, it’s become a continuously running process, and it is synchronised weekly with the rest of Data Release. As a part of this task, we had to figure out how to de-duplicate images in a way that will be fast and suitable for our needs. Here’s how we did it.

Image deduplication
You might wonder, what’s the deal here. Couldn’t we just take all the pictures from the providers and display them on our web-site? Well… the result would probably look more or less like this:

457407bdcea1c56d676797e9815f43c65570cd4a26b98fe3d311e3d9434537be31ea0340cab9c04ac4b80cd374a8308f

93f0edab452115a7ab842a0efc343c52 4c0000b6e435d51d3dabfa031ea874f550f1c1b057fd7364fc1a4cc76ed27725

 

 

Not exactly helpful, and certainly not the kind of experience we want travellers to have on our site. As you can see, most of the images from different providers are in fact all the same. Just to make things a bit more complicated, they might also be resized, recolored trimmed, watermarked etc. Effectively, we had to create a system that would automatically tell that:
The following two images are the same, and we should use the bigger one:

photo example1

The following two are not:

hotel example2

The following are the same for our purposes:

hotel example

The left is cropped, and the right is better:

hotel example 4

The process of finding these image-near-duplicates is best done by calculating a so called image-hash, and comparing the hashes of all of the images we have downloaded. There is a multitude of possible hashes: pHash, aHash, dHash, perceptual-hash… and each comparison can be done at varying level of accuracy… so how do we know which one to chose?

download

Of course, we need to measure. Which brings us finally to the Image Release Corpus.

Image Release Corpus
A corpus is a set of data with accompanying manual labels attached to them. In our case, the corpus comprises of about 1,200 images grouped together into 500 groups with identical visual content each. These were grouped manually in a tedious process involving an HDTV and a small custom script for a quick pre-grouping, browsing and labelling of images. Let me tell you: I do not ever want to see a hotel in Dubai again.

Once this work is done, we can run any algorithm for image deduplication on all of the images and measure its performance against human decision.
There are several measures that can be used for evaluation of performance:
· Purity – how many generated groups contain only a single manual label
· Completeness – how many generated groups contain all images of the same manual label
· Duplicates – how many of the same images are we likely to show to the end-user
In all of the possible approaches, one always has to balance between being too strict about image comparison (which leads to higher number of duplicates shown to the end user) or too lenient (which leads to grouping different images together, and an effective image loss).

Of course, one of the cool things about being a developer is that you can write tools that will help you write tools for the task at hand. With the tools of your choice.

make tools

So, after a bit of fiddling with Jenkins, Django and AngularJS, we came up with a small dashboard that is updated on every push to our code repository and evaluates all of the measures for the current Image Release deduplication process.

corpus

In this way, we could quickly evaluate all the available image hashing methods and play with different accuracies for comparison. Additionally, for debugging purposes, we can dig further to actually see what kind of mistakes the algorithm made on each group of images.

corpus dashboard 2

And we can even look into the specifics of image to image comparison.

corpus dashboard 3

Doing so has allowed us to quickly evaluate our approach and chose one that not only worked faster and more reliably than what we started with, but which also allowed us to bring in more than 20% of the images that were previously discarded due to incorrect deduplication. At the same time, this process allows us to stay within the limit of the same probability of showing a duplicate image to the end-user.

Simple image deduplication is just the beginning. The potential for the image analysis is certainly there, and we already have quite some data to work on it. We might, one day, revisit it.


Sign up for email updates from the CodeVoyagers team


Gareth’s Start-up ‘Laws’

Posted on by Gareth Williams

As one of Skyscanner’s co-founders, I’m often asked for my thoughts on entrepreneurship. Here are my three self-styled ‘Start-up Laws’; a collection of things I’ve learned along the way and my own personal beliefs in terms of how a fast-growth business should operate.

Gareth’s Law 1 : Advertising is the way to solve the problem of revenue outstripping costs.

I say the above with tongue firmly in cheek.   All companies seek a way in which to increase awareness of their service – that’s natural. But the easiest path to spend money on acquiring new users to the point of marginal profitability (and beyond) is via over-dependence on advertising.  Especially for replacement visitors.

As a start-up, your resources may be better spent making your product 10x better. Focus mainly on product improvements, retention and virality.  Yes, they require greater skill, but surely a better product is worth more to your users than a churning user base?

At Skyscanner we started by sharing two salaries between three co-founders and our first external funding was fully six years after the first prototype. We only had resources to build product.  Our first marketing was PR (see Paul Graham) and our second was SEO. The interesting thing in retrospect is that they are both fixed costs (cost in time and money was not a per user cost like an ad).

Nowadays, as a more mature company, advertising brings us great value as part of our acquisition/activation/retention pipeline – but as start-up, I’d recommend making your product as good as it can be first.

 

Gareth’s Law 2: The size of an email footer is inversely proportional to the growth prospects of that company.

I once heard a website homepage described as something that represents the scars and battles of the departments in a company. I think the same can be said about the size of your startup’s email footer.

With an email footer you might see a logo, a fax number (still), links to company apps, a legal disclaimer, an event or new product plug and so much more. Of course there is the counter-productive and passive aggressive ‘think twice before printing this email’. These can be symptomatic of box ticking and an aspiration to come across as ‘professional’.  The vast majority of start-ups need to change and adapt quickly and they require great flexibility. They also, by and large, require a singular vision and interest. A long email footer, in contrast, suggests conflicting interests and bloat rather than a simple, widely-embraced aim.  Sadly for start-ups, the inverse of this ‘law’ is not guaranteed.

Gareth’s Law 3: Internet Economy success at scale is converging in all sectors on being an AI / machine-learning problem.

As so many aspects of internet economy success become shared knowledge sitting atop open source software, the ‘last’ race for online services is trending towards being to solve complex data and personalisation problems. Doing so will (and will become a prerequisite to) delight users – across every sector.

Take Facebook’s news feed which, very far from being a manual curation problem, is a machine learning or AI one. Mass personalisation appears to be an oxymoron.  But at the very least this requires ever-more complex heuristics.  Increasingly, the way to win as an Internet Economy start-up with traction is to look to AI/machine learning to achieve that magical experience for the user. Think of Google Now cards, Netflix channel curation or in online travel solving the ultimate challenge – “Where should I go on holiday?”


Sign up for email updates from the CodeVoyagers team


Configuration as a Service

Posted on by Raymond Davies

Configuration as a Service: Moving Quickly

2959670-screen+shot+2015-11-02+at+1.42.29+pm

As a web scale tech organisation, it’s important that we can move quickly at scale within Skyscanner.

While we’re always tackling this – for example, by moving ever closer to continuous deployment, or by removing barriers which can slow teams down, like infrastructure provisioning (by moving towards AWS) – sometimes it would be nice to make changes without deploying any code at all.

This is where Configuration as a service comes in – the ability to change the behavior of our software systems on the fly without the need to make code changes. Recently the squad I work in released Skyscanner’s first iteration of Configuration as a Service.

Our main motivation behind the system was to enable anyone in the business to safely make changes to our production systems while having the changes backed by A/B tests and associated metrics & reporting. Another motivation was that the system allows us to gracefully bypass a service which is experiencing an unexpected problem. Having this flexibility means we can continue deliver the core experience which people come to Skyscanner for even if something goes wrong behind the scenes.

 

Letting anyone in the business make a production change… isn’t that dangerous?

no_brain

I hear you and I guess the answer is, potentially yes. You aren’t the first to ask. This type of question was brought up more than once when we were originally pitching the idea to other squads.

To mitigate this issue, squads have total control over which aspects of their systems they expose via Configuration.  They can also require changes to go through an approval process (with a comprehensive audit trail) which gives them an opportunity to preview changes before they go live. Finally, and as yet not implemented, we’ll be adding a progression for changes from our pre-production environments through to production, which gives an additional gateway for sanity checking.

It’s also worth bearing in mind that these changes are initially launched to the public behind an A/B test which has metrics and monitoring attached to it, so it’s actually quite difficult to really mess things up (at least without us noticing very quickly!)

 

Metrics & Monitoring

download

How do you know you are making changes which are positively impacting your user’s experience of your product? Or perhaps even more crucially, how do you know when something is broken?

These questions are being asked more and more of development teams as the ‘devops’ culture becomes increasingly popular.  They’re important questions at any scale, but they become incredibly important when, like Skyscanner, you’re dealing with hundreds of engineers, in teams all over the world, releasing code at their own heartbeat. When you extend this ability to anyone within the organisation and remove the barrier of shipping a release, it becomes an absolute necessity.

Our Configuration service leverages Skyscanner’s internal experimentation platform ‘Dr Jekyll’, which provides us with an A/B testing framework and the automatic ramp up of successful changes.

We can then track how users in the A and B buckets behave using tools, such as Mixpanel. This is one of the places we monitor how people move through our product funnels and whether they exit to our partners or if they bounce at a certain point in the process.  If you’ve just made a Configuration change to how our search controls work for example, and we see that users with your change seem to be having trouble finding the flight they’re looking for, we’ll probably review the change and assess the impact it’s having on users.

Similarly, it’s a requirement within Skyscanner that systems have monitoring and alerts configured against against agreed KPIs and machine level metrics. We use systems like Seyren, Graphite and VictorOps for monitoring and alerts. This means that we can quickly identify abnormal behavior, and ideally, pull the system using Configuration while any problem is rectified.

 

Exciting Times

image

Squads are already coming forward with some really interesting use-cases for the system, including some we’d not even thought of when dreaming it up, which is awesome. One such idea would see us able to serve part of our flights funnel while removing all of the servers which currently host that part of the system. Absolutely not the use case we’d imagined using Configuration as a service for, but really interesting none the less.

While it’s early days for Configuration as a service at Skyscanner, I’m really excited about where it’ll take us and the interesting ideas that people, who might not otherwise have had the ability to make changes, will bring to the table.

 

 


Sign up for email updates from the CodeVoyagers team


Hardcore! From Seed To Apple Watch App In Five Weeks

Posted on by Balint Orosz

applewatch

Back in May, we launched an Apple Watch app, which we had created in just five weeks. As any engineer will understand, those five weeks were somewhat stressful. A little caffeine-fuelled. Against-the-clock speed was needed, while at the same time we knew we couldn’t compromise on quality. No biggie, right?

Some of you might have unwrapped an Apple Watch as a gift over the festive season, so we figured this might be an ideal time to revisit quite how we went from a seed of an idea to a fully functioning app in five weeks.

A utility concept with no back-up data
Back then, the Apple Watch was a completely new product for iOS users. Therefore the first challenge we encountered was in shaping the utility concept with no back-up data to support it. Unable to refer to industry data on smart-watches, we instead scoped out an initial idea without any technical limitations, which actually turned out to be a refreshing way of working.

First we asked ourselves: what might a traveller need that can be provided by an ‘on the go’ technology that’s also in line with our existing Hotels app? A clear concept came to the fore: a ‘find your way back’ style app.

Say you go out for dinner, or simply a walk in a new city. Often, you can find yourself a little lost while trying to figure out how to get back to your hotel. We’ve all been there, and it’s all the more easy to become confused and disorientated when street signs are in a foreign language or a completely different script. Even if you’ve got access to online maps, map search still has limited functionality to find hotels, especially if you don’t know the exact name or address.

Therefore our idea was to create a simple, easy-to-use app for the new Apple Watch that helps travellers get back to their accommodation. We named it ‘Find Your Way’.

Five days of intensive research and design
The rapid pace required for the research and design in a short time-frame presented a fun (if exhausting) challenge, since the end result had to be a working product, not just a prototype. As such, continuous feasibility checks with our software developer teammates were vital, especially as the Watch was a completely new tool for them too.

Two things that shaped our journey:
• To sync the app with the Watch, we needed to build on its current capabilities, so we decided to go with the existing ‘Favourite’ feature
• Since we didn’t have an Apple Watch to hand, we relied on Apple’s well-defined, standard guidelines for the UI

In the first two days of the research and design sprint we explored the flow of the app on sketches and drawings, combined with on-going discussions with developers on the feasibility of our proposed features. Working so closely with our developers was one of the biggest learning outcomes of the whole process: we learned how to think with a developers’ perspective.

Day three: testing
Day three, and we had an initial design and even a tester. Of course, that also threw up a pretty basic but crucial conundrum: how on earth were we going to test the app without an actual Apple Watch device? So we went old-school. We printed the watch on a paper, got crafty with the scissors and put it on the wrists of our tester volunteers.

apple2

As the Watch app is about getting back to your hotel, we stepped out from the usual user testing methods and went to the streets, walking around and talking with our testers; how did they feel about the concept, what did they think about this particular aspect. Another added challenge — like us, our testers had never used the Apple Watch either, and most hadn’t ven set eyes on one. Usually, even if an app is new to someone, testers know how to start to explore it because they’ve handled the device (say, an iPhone) before — but of course, this wasn’t the case with the hotly-anticipated and closely guarded Apple Watch.

Our solution was to take testers through the two paper prototypes (which represented the two key screens of the app), and, given the constraints above, we also talked them through key features, rather than explaining what they could do on the screen or how they could control it.

Such limitations did make it tricky to agree on final learnings and take aways. However, plenty of UI variations later, our technical requirements were taking shape. The biggest area of debate surrounded readability, which is crucial on a device like the Apple Watch. Without the real thing to hand, we had to make do with mobile handsets, experimenting with how the designed content might look like on a small screen.

apple3

Finally: the real thing!
Drum-roll please: we were like kids in a candy store come the Apple Watch Lab event, where we could finally test the app for the first time, on a real, live device. We discarded our paper cut outs, consumed a frankly unhealthy amount of caffeine and energy drinks, and really got down to the nitty-gritty, rapidly making changes and amends until we had final approval.

From seed to Apple app in five weeks: it was whirlwind, but we’re delighted with the results (even if we do say so ourselves!). Our first users were from the US, UK, Germany, Australia and Japan. You can see the outcome yourself: we’ve created a guide to using the ‘Find Your Way’ feature here.


Sign up for email updates from the CodeVoyagers team