Transitioning From Objective C to Swift in 4 Steps – Without Rewriting The Existing Code

Posted on by Gergely Orosz

[Note: as a follow-up post, for more details on how we made this transition, please see: How We Migrated Our Objective C Projects to Swift – Step By Step]

We started developing Skyscanner TravelPro in Objective C in March 2015. A couple of months later, when Swift 2.0 was released, we started to slowly introduce Swift. Fast forward to 8 months later – and 100% of the new code we write is in Swift. All without having rewritten any of our existing, working, robust and tested Objective C code – there would have been little point in doing so.

There are many resources talking about how to decide whether to use Swift for a new project or not, and best practices for writing Swift. However if you’re working on a pretty large Objective C codebase, you will probably find this article useful. If not – one day you might bump into a codebase where you want to start using Swift: this article presents some advice on how to get started doing so.

Here’s a visual representation of how our codebase has changed in 10 months. Since November 2015 all new code is written in Swift, which now makes up about 10% of our 65,000 line codebase – and growing.

Skyscanner TravelPro Objective C vs Swift code lines of code over 10 months

So what was our approach when going from Objective C to Swift?
Continue reading Transitioning From Objective C to Swift in 4 Steps – Without Rewriting The Existing Code


Sign up for email updates from the CodeVoyagers team


A Journey in React with the Car Hire Frontend Squad

Posted on by Graham Martin

The airport transfers product delivered and pushed live in the UK market by Skyscanner’s car hire tribe in mid-2015 was very much built as an MVP. The frontend architecture was ‘borrowed’ heavily from Skyscanner’s car hire product, with the result that we had large chunks of unused, untested code. When the MVP showed promise, and the car hire frontend squad were given the green light to re-architect the client application to make it more stable and maintainable, we began to think about how we’d put it together.

Our main aims for rewrite were to:

  • Keep the view/presentation layer simple to make for easier unit testing. As part of this, look at creating reusable presentational components that could be shared with car hire.
  • Improve the rendering performance, as the html for the results list. Each deal (implemented as Backbone views) was scrapped and regenerated on receiving new quotes from the API, or navigating to another page. This made for a very jumpy user experience.
  • Keep in mind that the architecture and patterns used would likely be applied to the more complex car hire product in future.

We took some time to look at our requirements and see whether any frameworks would help with their delivery. While the scope of our work was a reasonable size, we weren’t rewriting the whole application. Instead we were focusing on the view layer (the V of MVC). Our logic and state were simple and there was no complex client-side navigation required. Angular, Ember and Backbone all seemed too heavyweight and opinionated for our needs. React, however, seemed to fit pretty well as the following snippets from the React homepage demonstrate:

Lots of people use React as the V in MVC. Since React makes no assumptions about the rest of your technology stack, it’s easy to try it out on a small feature in an existing project.

React implements one-way reactive data flow which reduces boilerplate and is easier to reason about than traditional data binding.

React’s primary attraction was it’s rendering performance. When a state change occurs, React will try to re-render the entire component subtree (more on component nesting later). The natural initial reaction to this is that it would degrade performance. But in re-rendering, React constructs a new virtual DOM – an abstract non-browser-specific version of the DOM – and uses some clever diffing techniques to calculate whether an actual DOM update is necessary. The DOM is then modified with the minimum number of changes possible. This efficient rendering was extremely appealing for our results page, where only selected areas of the page content are updated on processing pricing service poll results.

So, we made the decision to give it a try.

Experimenting with React

The first thing we had to get our head around was the JSX syntax. While it is not mandatory for writing React components (simple javascript can be used), JSX is recommended. This is because of its concise XML-like structure that is more readable when defining large component trees. React components and standard HTML tags can be mixed in JSX. For example, the following snippet creates a resultsContainer fragment from a mix of lower-case HTML tags and React components (ResultCount and Pagination) that begin with an upper case letter.

The attributes that are set on the React components are known as props and pass data to the component to be used in the rendering logic. A fundamental rule in defining React components is that a component cannot mutate its props. This is to guarantee consistency in UIs. A component can define a contract on its props by defining PropTypes. These validate the props that are passed on creation of the component instance and warn on any validation failures. There are lots of different types of out of the box PropTypes, including string, bool, function, object, array and all of these can be marked as required. When your validation requirements are not covered by the core types, custom PropTypes can be wired up simply by writing a function.

The mixing of HTML and javascript in JSX takes a bit of getting used to – after all we’ve been taught to separate our presentation from our functionality for years – but the packaging of the logic along with the markup that it affects in a self-contained component actually makes a lot of sense. The components are easily understandable, the responsibilities are clear and the tracing of complex flows from javascript -> HTML and back is simple. We found that once we overcame the initial culture shock, working with JSX was very satisfying.

React components can be nested in a tree structure. We can compose complex UIs from multiple small, well-defined, reusable components using this structure. When defining a hierarchy of components we end up with components that own other components – that is components that create instance of other components in their render() method – and that are responsible for setting the props of their owned components. This one-way data flow down the component hierarchy is what triggers the re-rendering of the component tree and is key to React’s simplicity.  We found that we iterated on component design, refactoring components by moving certain presentation and logic into new, small components.

State Management

State and props are closely related but are distinct concepts. Most components will be stateless and will simply render data received through props. Others will respond to user input or handle server responses – these require state. Identifying where the state should be manipulated was challenging.  We found that we were dealing with state in too many places and some simple components contained overly complex logic. We needed a better way of isolating our global state and supporting the one-way data flow. What we were missing from the jigsaw was the Flux architecture.

The Flux architecture isn’t a 1960’s spy film starring Michael Caine, it’s a pattern used by React’s creator Facebook to manage data flow in its web applications. The data flow looks like this (taken from the Flux architecture docs).

flux-simple-f8-diagram-1300w

User interaction with a view causes an action to be propagated to the data store(s) by a central dispatcher component and the resulting state change(s) are reflected in any affected views. This is similar to MVC but the data flow is unidirectional.

Redux and React-redux are two complementary libraries that provide the most popular concrete implementation of this pattern in the React space. Redux implements the concept of a single store which contains the whole state of your application, as well as the dispatcher functions (called reducers) which return the resultant new state.  The immutability of the state is key to predictability of the application – the reducer is a pure function that must return a new state object rather than mutating the current state.  React-redux provides the bindings and plumbing code required to expose the store to all components in the hierarchy and encourages you to think of components as either container or presentational components.  Only container components interact with the store – invoking actions and receiving new state – and then pass the state down the hierarchy to presentational components via props.  Testing of presentational components is simple as we are only concerned with the output of the render() method while state changes are easily testable as they are decoupled from React.

Unit Testing React

The common approach to unit testing React components was, until quite recently, to render the components into a DOM (using something like jsdom) and assert against them using React’s TestUtils.  However, in React v0.13, Shallow rendering was introduced.  This feature does not require a DOM and effectively isolates the rendering to the component under test as it renders the component only one level deep.  Child components are not instantiated so test setup is simplified.  The output to the shallow rendering of the components is an object that is unfortunately quite difficult to traverse.  To simplify this, we used the excellent skin-deep library, which provides you with lots of useful methods for digging through the properties.  You can see from the following example how clean the tests are:

What have we learned?

So, we’ve learned an awful lot about React in the time we’ve been iterating on our implementation. What we’ve ended up with in our airport transfers client application is a nicely architected solution with small, composable components that have a clear responsibility. The data flow in the application is simple and state transformations are predictable and easy to test. Of course, you can achieve these results without React or Redux but these libraries force you into thinking about your application in a way that encourages this, as well as handling the plumbing and setup for you.

There is an overhead to using React. As well as the learning curve (which hopefully will be less steep for you after you’ve read this :-)), the package(s) add around 40KB to your gzipped javascript bundle size. You may have noticed that I’ve not mentioned rendering performance improvements here, and that is because our results were inconclusive. We are still experimenting with the best way to measure this accurately but rudimentary timings from switching pages have yielded insignificant differences.

So, where do we go next? Well, improving the profiling of our application is our first goal so we can accurately measure our React timings. As I mentioned at the start of this article, we are looking to redesign our car hire results page in the near future and this will effectively require a rewrite of the client application. Our test coverage is low in some key areas so we’ll compose the page of small well-defined React components that are easily testable. We’ll move all of our state to a global store and describe state transformation in actions and implement them in a reducer function, and increase our test coverage there too. We’ll A/B test the new implementation (which we didn’t do in airport transfers due to the small level of traffic the product initially received) to ensure we’ve not degraded the user experience, and we’ll iterate. Look out for a future blog post to see how we got on.


Sign up for email updates from the CodeVoyagers team


A polyglot architecture – Skyscanner’s frontend under the hood

Posted on by Alex Bardas

A few months ago I had the pleasure – if I am to ignore the random weather– to travel to London and attend a recruitment event called Silicon Milkroundabout with some of my colleagues from Skyscanner. The most frequent question I was asked was “What programming language do you use?” or “What’s Skyscanner written in?”

It’s an interesting question, because it’s hard to give a straight answer to it. Not because of some weird non-disclosure agreement (we’re pretty open about the technologies we use and we open source a lot of our projects – check out our github), but because we use so many of them. Here’s a look at how and why.

Tribes, squads and different programming languages

Internally, Skyscanner is ‘squadified’, an organizational model described here, which actually means we’re organized in tribes – yes we called them tribes and each has a tribe leader, although I’m not sure that’s actually written on their business cards . Each tribe is divided into squads.

Squads are multidisciplinary teams that own a service and have a very high degree of autonomy, the main – and sometimes only – constraint being that they must respect contracts with other squads. Think of each squad as a mini-startup. Under this paradigm, squad members can pick their release cycle, their agile methodology – scrum, Kanban – and of course, their technology stack. Which makes things really interesting and very diverse.

At Skyscanner we have  Java, Python, .NET, PHP, Ruby, nodeJS squads and the list can go on. We believe that the problem choses the technology and not the other way around, so I can say we’re actually polyglots when it comes to programming languages. Which is really cool, but also really hard to explain at a recruitment event.

The Hotels Vertical

I work in the Hotels tribe, the frontend squad, but other verticals share a similar architectural approach. Compiling a list of hotels and displaying it to the end user, with up to date prices from multiple global partners, without duplicates and with relevant images goes well beyond issuing a “SELECT * FROM hotels WHERE price BETWEEN …” DB query. It’s a fine-tuned process that includes multiple teams, including:

• Partner engineering squad – they liaise with our partners and are responsible with retrieving information from them, such as hotel prices. Stack: mostly Python.

• Data squad – this squad creates the so called “hotel data packages” which are used to display information from the users, making sure the information is consistent and without duplicates.
Jacek’s One picture is worth a thousand words. So, how does it scale to a million pictures? provides a more in depth view on what they do. Stack: Python.

• Geo squad – this squad maintains the Travel Knowledge Graph, a database system that represents the world as an ontology. This database can be queried directly using a language called DQL (Distributed Query Language). Stack: Python and a modified version of postgreSQL.

• Search services squad – tasked with providing the best autocomplete results to the user, their algorithms try to guess the user’s intended destination even when there are typos involved. More on the subject in Ben’s Measuring Autosuggest Quality post. Stack: Java, Solr and Lucene.

• Backend squad – the backend communicates with all the squads described above and compiles a list of results that is made available to the frontend via a RESTful API. Stack: Python.

• Frontend squad – my squad. We’re tasked with creating and maintaining the ESI components used to display the hotels pages. Stack:  PHP, CSS & various Javascript libraries.

• Web Applications squad – they own the Scaffolding component described in detail below and some elements that are common across all Skyscanner’s pages, such as headers & footers. Stack: .NET / C#.

• INTLOC squad – our Internationalization & Localization squad. Their service allows all other squads to deliver a localized native experience to our global pool of users. Stack: .NET / C#

 

Edge Side Includes in the frontend

The Skyscanner frontend is not a ‘site’ in the traditional sense of the word, with HTML code being generated by a single server-side technology and served to the user by Apache or nginx, but rather a collection of various ESI components developed and managed by different squads. The end result is that depending on where one might click on the page, the underlying HTML was generated by a different server-side technology owned by a different squad, located in a different part of the world.edgeincludes

 

So what exactly is an ESI component in our case? Well, it’s a self contained entity that renders, styles and provides JS interaction to a piece of HTML. Each component has a unique URL – ex: /hotels/search-box – and several endpoints, each with its own responsibility, as shown in the image below.

tudor1

The endpoint’s name is appended to the component’s URL to create the endpoint’s URL. So for example, if I want to render the script tag for the hotels search box somewhere in the footer, I would issue a request to /hotels/search-box/script.
In the current architecture, each public URL has a template mapped to it, with placeholders for ESI components. This template is pre-processed by a component called Scaffolding and sent to Varnish which in turn requests all the ESI endpoints, applies caching rules and sends the end result back to the user. Given that ESI URLs are internal and dispatched by Varnish, components cannot access directly information coming from the client, such as query string parameters or cookies and this information is being requested via a special endpoint called requirements and injected by Scaffolding during pre-processing.
Here’s a simplified diagram that shows how a HTTP request is being handled.
tudor2

In a nutshell…

Hundreds of engineers working with different technologies, in different geographical areas, are releasing independently of each other at different times components that are assembled on the fly to render the site. Seems like a giant puzzle. And every single time I describe our architecture and way of working, I get the following question: “Does it actually work?” Yes! Amazingly well, and for 50 million users every month.

How we got here

Things were not always like this for the Hotels Frontend Squad and if you want to learn about our journey, have a look at my presentation ‘Skyscanner Journey – From code jungle to state of the art’ given at the PHP Barcelona Conference in 2015.

For more on being a polyglot, see Richard Lennox’s post on being a ‘Polyglot Technologist’ here.


Sign up for email updates from the CodeVoyagers team


Buckets of Onboarding: saving effort and money with AWS S3

Posted on by Pim Van Oerle

Traditionally we engineers tend to think in servers – need to serve up some new web content? Spin up a bunch of webservers to serve that content with a nice load balancer over them, set up a deployment pipeline to get the software out and you’re flying.

Or are you?

When we first started our experiments with Onboarding Pop-ups, newsletter signup boxes and other awesome onboarding things, we designed the servers in the normal fashion. Some Linux instances in Amazon EC2, a very simple Nginx/Flask/Python server and a lot of static files.

Quite soon we found that we did not really use the service part at all, but were really just serving static files. That left us with what basically is a a set of nice, load balanced and redundantly running servers serving up static content. Which – on reflection – wasn’t the brightest way of doing things. Here we were, using complex servers running on a system that was built around having cheap, scalable and stable static file serving – only to re-invent that file-serving in a more clunky, less scalable and more expensive way.

We decided to take action and completely kill our nice new shiny servers – instead deciding to simply deploy everything we do that is not explicitly a Service to a bucket in S3.

So here we are now – serving all of our Onboarding Code from a bunch of buckets in the cloud!

bucket

So what are the benefits?

There’s two big benefits to doing this.

1. Cost

From a service cost perspective using S3 is much much cheaper – it is built explicitly to store and serve files that don’t change much and this is exactly what we are using it for.

Having a set of EC2 instances sitting there doing frankly not very much, on the other hand, is quite a bit more expensive. Everything is still pretty fresh so we can only guess the cost based on calculators, but from a quick bash at the AWS cost calculator we are cutting cost by at least a factor 10.

2. Ease of Scaling, Maintenance and Deployment

From the point of view of the squad, we have simplified our DevOps load quite a bit by doing this. S3 takes care of scaling for us, we just upload files. Deployment is much simpler as we have eliminated any actual servers, server configuration and all the wiggly bitts that tend to go with that.

How does it work?

The system is pretty simple – we have a few elements:

• One web-enabled, versioning-enabled Bucket that the various Onboarding Pipelines deploy their code into.

• A Route53 DNS entry over that to ensure a fixed, simple address in our skyscnr.com AWS domain.

• And finally an Akamai route to the whole thing so that we can serve our files from www.skyscanner.net/svcs/onboarding/* – looking much better and also minimizing cross-site issues.

That’s it. To deploy, we prepare all our static files in Teamcity with some clever Grunt steps, generating a static file per locale for localisation, minifying where needed and bundling up the various files for easy deployment. Once that is done we simply copy to the bucket and we’re done.

Next Challenges

Of course we’re not there yet – we have only taken the first steps towards a proper bucket-based deployment system. Below are some of the next challenges that we’re working on.

S3 and Edge-side Includes

We use ESI to assemble most of our website in a smart, easy and cachable way. To be able to fit into that system we will have to find a way to conform to some of the expectations that our ESI system has – and that do not seem to quite fit with a system that can only serve static files. Do we build a very simple Lambda service to deal with this? Do we investigate what Akamai (inventors of ESI, after all) can do for us there? There are plenty of avenues to try out, and it’ll be a really interesting question to figure out.

Blue-Green Deployment

Blue-Green Deployment is awesome! There are a few really cool systems currently in development around the business to do this with services and Elastic Load Balancers in AWS – but how do we do this with just a bunch of files in S3?

Again, we’ll use the functionality AWS provides. The buckets can version their own contents, and provide easy rollback through the API. We can use that to give us a binary form of Blue-Green deployment – roll out the new version and monitor performance of our key metrics (in Mixpanel or via our own internal logging system, Grappler). If the performance of the new version falls outside set bounds, roll-back to the previous version can happen automatically.

Marrying S3 Buckets with our CD Environments

Most of Skyscanner runs with four different environment for continuous deployment – Int-CI, Int-Review, Pre-Production and finally Production. That’s great for continuous deployment and to guarantee thorough quality assurance spots while keeping the environments available for everyone to test.

But how does that work when serving code from a bucket in AWS? If we crack Blue-Green deploy we can deploy in two steps instead – running all our tests in a simple test environment and then just deploying to Production, rolling back to the Blue line if any of the service or business metrics show issues.

That quickly leads to this question – if we can do that, how do we marry that to the four different environments Skyscanner has in a smart way? We could just have four copies of the same file, but that feels like waste. There must be a better way that’ll allow both use cases – another thing to figure out over the next weeks.

 


Sign up for email updates from the CodeVoyagers team


Measuring Autosuggest Quality

Posted on by Ben Torfs

Greetings Code Voyagers, from the Free Text Search squad.

We power most of the Skyscanner auto-suggest search boxes, such as the ones where you select an airport where you want to fly to, or a city in which you need to find a hotel. More generally though, you could say that our mission is to map user input to the user intention, using as few keystrokes as possible.

autosuggest

Autosuggest: speed, relevancy and the ‘zero-result rate’

These search results need to appear very fast (less than 200ms, preferably), but above all, they need to be relevant. This especially true in the mobile market, where character typing can be a bit of a hassle and screen real estate is too scarce to display long lists of results.

Our current service is working well, and we are proud of the speed and accuracy of our results (even when the user includes some challenging typos). As always though, there is room for improvement, particularly in markets using non-Latin scripts. Measuring the quality of our service is tremendously important in identifying areas of improvement as well as enabling better A/B testing in the future.

Today, the most important metric we use is the rate of queries returning no results at all (the ‘zero-result rate’). At first it seems like an overly simplistic metric, but it can actually be quite useful to compare the performance between different locales, and how they evolve over time.

For instance, let’s take a look at this measure for the past six months in the UK, our longest supported market, where we’ve spent a lot of time optimizing the site. Our results are very strong, yet there are still  very small amounts of user-made typing errors that we cannot recover from – for example, a user may be searching for a location that doesn’t have an airport, or attempt to search for a flight to ‘Frankfart’ rather than ‘Frankfurt’ (always amusing).

Auto-suggest and non-Latin scripts

It’s not quite so easy when optimizing for newer markets or Skyscanner, where non-Latin script is used. There are some great tools out there that have really helped us make fantastic improvements; in Japan, we’ve used the wonderful Kuromoji library to convert these queries into the various Japanese character types. We’ve made similar enhancements for other languages such as Korean, which again has resulted in real progress.

Alternative auto-suggest KPIs

The zero-result rate provides us with a good idea of where to steer our efforts, but it is pretty coarse and we are looking for new and better KPIs. Here are some of the ideas we came up with:
• How many characters did the user have to type before s/he was able click on the result s/he was looking for? This metric has a direct relationship to the usability of the site. We could also count every backspace character as well, since those give an indication that we are not sufficiently resilient to typing errors.

•Whenever a result is selected, what was its position in the suggestion list? We should aim to have the clicked result to always be the first one. Today, the search ranking is already dependent on the selected market. For instance, a user who searches for ‘san’ in the USA will be returned results such as San Francisco and San Diego first. The same query typed in Spain however, will produce higher rankings for Santander and San Sebastián. Other improvements might include storing an individual’s search history and providing easier access to the queries that a user types most often.

• How many users started typing a query, but never actually selected a result (the ‘abandonment rate’)? In this case it is not only important to know how often that happens, but also why it happened. It might indicate that a street name was changed somewhere, and needs to be updated in our database.

Surely this list is not complete. Do you have thoughts on this, or other ideas on how to measure and improve our auto-suggest results? Please let us know in the comments, because we would love to hear them.

 


Sign up for email updates from the CodeVoyagers team


One picture is worth a thousand words. So, how does it scale to a million pictures?

Posted on by Jacek Wojdel

Well…it probably depends on whether they are all the same or not.

We always knew we wanted our hotel product to be very visual. Booking a hotel isn’t the same as booking a flight; photography really helps bring the hotel experience to life, which is why, on average, when a traveller looks at a hotel on the Skyscanner site, they’ll see around a dozen photos to help them make a decision on where to stay.
However, collecting these images is another matter. Every time we present a piece of information on our webpage, it is in fact a consolidated view derived from tens of different sources. We partner with over a hundred providers, and each of them, for each hotel, will give us the hotel’s details (name, street address, type of accommodation, rating etc). It’s then the Hotels Data team’s job to decide which data to use to present it in the best way to our users. The automated process of doing so this is what we call ‘Data Release’, so in essence:

release

If you just thought ‘deduplication’, or ‘entity resolution’, you’re on a right track. An integral part of the data provided to us are the images of the hotels. Our team is tasked with downloading all of them (literally millions) from our partners and figuring out which ones to present on our webpage. Again, this happens all automatically, in the ‘Image Release’ process.

release2

About a year ago, this process was running on one of our data-centres, took about three days, and could be initiated roughly once a month. Since then, we moved to Cloud, it’s become a continuously running process, and it is synchronised weekly with the rest of Data Release. As a part of this task, we had to figure out how to de-duplicate images in a way that will be fast and suitable for our needs. Here’s how we did it.

Image deduplication
You might wonder, what’s the deal here. Couldn’t we just take all the pictures from the providers and display them on our web-site? Well… the result would probably look more or less like this:

457407bdcea1c56d676797e9815f43c65570cd4a26b98fe3d311e3d9434537be31ea0340cab9c04ac4b80cd374a8308f

93f0edab452115a7ab842a0efc343c52 4c0000b6e435d51d3dabfa031ea874f550f1c1b057fd7364fc1a4cc76ed27725

 

 

Not exactly helpful, and certainly not the kind of experience we want travellers to have on our site. As you can see, most of the images from different providers are in fact all the same. Just to make things a bit more complicated, they might also be resized, recolored trimmed, watermarked etc. Effectively, we had to create a system that would automatically tell that:
The following two images are the same, and we should use the bigger one:

photo example1

The following two are not:

hotel example2

The following are the same for our purposes:

hotel example

The left is cropped, and the right is better:

hotel example 4

The process of finding these image-near-duplicates is best done by calculating a so called image-hash, and comparing the hashes of all of the images we have downloaded. There is a multitude of possible hashes: pHash, aHash, dHash, perceptual-hash… and each comparison can be done at varying level of accuracy… so how do we know which one to chose?

download

Of course, we need to measure. Which brings us finally to the Image Release Corpus.

Image Release Corpus
A corpus is a set of data with accompanying manual labels attached to them. In our case, the corpus comprises of about 1,200 images grouped together into 500 groups with identical visual content each. These were grouped manually in a tedious process involving an HDTV and a small custom script for a quick pre-grouping, browsing and labelling of images. Let me tell you: I do not ever want to see a hotel in Dubai again.

Once this work is done, we can run any algorithm for image deduplication on all of the images and measure its performance against human decision.
There are several measures that can be used for evaluation of performance:
· Purity – how many generated groups contain only a single manual label
· Completeness – how many generated groups contain all images of the same manual label
· Duplicates – how many of the same images are we likely to show to the end-user
In all of the possible approaches, one always has to balance between being too strict about image comparison (which leads to higher number of duplicates shown to the end user) or too lenient (which leads to grouping different images together, and an effective image loss).

Of course, one of the cool things about being a developer is that you can write tools that will help you write tools for the task at hand. With the tools of your choice.

make tools

So, after a bit of fiddling with Jenkins, Django and AngularJS, we came up with a small dashboard that is updated on every push to our code repository and evaluates all of the measures for the current Image Release deduplication process.

corpus

In this way, we could quickly evaluate all the available image hashing methods and play with different accuracies for comparison. Additionally, for debugging purposes, we can dig further to actually see what kind of mistakes the algorithm made on each group of images.

corpus dashboard 2

And we can even look into the specifics of image to image comparison.

corpus dashboard 3

Doing so has allowed us to quickly evaluate our approach and chose one that not only worked faster and more reliably than what we started with, but which also allowed us to bring in more than 20% of the images that were previously discarded due to incorrect deduplication. At the same time, this process allows us to stay within the limit of the same probability of showing a duplicate image to the end-user.

Simple image deduplication is just the beginning. The potential for the image analysis is certainly there, and we already have quite some data to work on it. We might, one day, revisit it.


Sign up for email updates from the CodeVoyagers team


Gareth’s Start-up ‘Laws’

Posted on by Gareth Williams

As one of Skyscanner’s co-founders, I’m often asked for my thoughts on entrepreneurship. Here are my three self-styled ‘Start-up Laws’; a collection of things I’ve learned along the way and my own personal beliefs in terms of how a fast-growth business should operate.

Gareth’s Law 1 : Advertising is the way to solve the problem of revenue outstripping costs.

I say the above with tongue firmly in cheek.   All companies seek a way in which to increase awareness of their service – that’s natural. But the easiest path to spend money on acquiring new users to the point of marginal profitability (and beyond) is via over-dependence on advertising.  Especially for replacement visitors.

As a start-up, your resources may be better spent making your product 10x better. Focus mainly on product improvements, retention and virality.  Yes, they require greater skill, but surely a better product is worth more to your users than a churning user base?

At Skyscanner we started by sharing two salaries between three co-founders and our first external funding was fully six years after the first prototype. We only had resources to build product.  Our first marketing was PR (see Paul Graham) and our second was SEO. The interesting thing in retrospect is that they are both fixed costs (cost in time and money was not a per user cost like an ad).

Nowadays, as a more mature company, advertising brings us great value as part of our acquisition/activation/retention pipeline – but as start-up, I’d recommend making your product as good as it can be first.

 

Gareth’s Law 2: The size of an email footer is inversely proportional to the growth prospects of that company.

I once heard a website homepage described as something that represents the scars and battles of the departments in a company. I think the same can be said about the size of your startup’s email footer.

With an email footer you might see a logo, a fax number (still), links to company apps, a legal disclaimer, an event or new product plug and so much more. Of course there is the counter-productive and passive aggressive ‘think twice before printing this email’. These can be symptomatic of box ticking and an aspiration to come across as ‘professional’.  The vast majority of start-ups need to change and adapt quickly and they require great flexibility. They also, by and large, require a singular vision and interest. A long email footer, in contrast, suggests conflicting interests and bloat rather than a simple, widely-embraced aim.  Sadly for start-ups, the inverse of this ‘law’ is not guaranteed.

Gareth’s Law 3: Internet Economy success at scale is converging in all sectors on being an AI / machine-learning problem.

As so many aspects of internet economy success become shared knowledge sitting atop open source software, the ‘last’ race for online services is trending towards being to solve complex data and personalisation problems. Doing so will (and will become a prerequisite to) delight users – across every sector.

Take Facebook’s news feed which, very far from being a manual curation problem, is a machine learning or AI one. Mass personalisation appears to be an oxymoron.  But at the very least this requires ever-more complex heuristics.  Increasingly, the way to win as an Internet Economy start-up with traction is to look to AI/machine learning to achieve that magical experience for the user. Think of Google Now cards, Netflix channel curation or in online travel solving the ultimate challenge – “Where should I go on holiday?”


Sign up for email updates from the CodeVoyagers team


Configuration as a Service

Posted on by Raymond Davies

Configuration as a Service: Moving Quickly

2959670-screen+shot+2015-11-02+at+1.42.29+pm

As a web scale tech organisation, it’s important that we can move quickly at scale within Skyscanner.

While we’re always tackling this – for example, by moving ever closer to continuous deployment, or by removing barriers which can slow teams down, like infrastructure provisioning (by moving towards AWS) – sometimes it would be nice to make changes without deploying any code at all.

This is where Configuration as a service comes in – the ability to change the behavior of our software systems on the fly without the need to make code changes. Recently the squad I work in released Skyscanner’s first iteration of Configuration as a Service.

Our main motivation behind the system was to enable anyone in the business to safely make changes to our production systems while having the changes backed by A/B tests and associated metrics & reporting. Another motivation was that the system allows us to gracefully bypass a service which is experiencing an unexpected problem. Having this flexibility means we can continue deliver the core experience which people come to Skyscanner for even if something goes wrong behind the scenes.

 

Letting anyone in the business make a production change… isn’t that dangerous?

no_brain

I hear you and I guess the answer is, potentially yes. You aren’t the first to ask. This type of question was brought up more than once when we were originally pitching the idea to other squads.

To mitigate this issue, squads have total control over which aspects of their systems they expose via Configuration.  They can also require changes to go through an approval process (with a comprehensive audit trail) which gives them an opportunity to preview changes before they go live. Finally, and as yet not implemented, we’ll be adding a progression for changes from our pre-production environments through to production, which gives an additional gateway for sanity checking.

It’s also worth bearing in mind that these changes are initially launched to the public behind an A/B test which has metrics and monitoring attached to it, so it’s actually quite difficult to really mess things up (at least without us noticing very quickly!)

 

Metrics & Monitoring

download

How do you know you are making changes which are positively impacting your user’s experience of your product? Or perhaps even more crucially, how do you know when something is broken?

These questions are being asked more and more of development teams as the ‘devops’ culture becomes increasingly popular.  They’re important questions at any scale, but they become incredibly important when, like Skyscanner, you’re dealing with hundreds of engineers, in teams all over the world, releasing code at their own heartbeat. When you extend this ability to anyone within the organisation and remove the barrier of shipping a release, it becomes an absolute necessity.

Our Configuration service leverages Skyscanner’s internal experimentation platform ‘Dr Jekyll’, which provides us with an A/B testing framework and the automatic ramp up of successful changes.

We can then track how users in the A and B buckets behave using tools, such as Mixpanel. This is one of the places we monitor how people move through our product funnels and whether they exit to our partners or if they bounce at a certain point in the process.  If you’ve just made a Configuration change to how our search controls work for example, and we see that users with your change seem to be having trouble finding the flight they’re looking for, we’ll probably review the change and assess the impact it’s having on users.

Similarly, it’s a requirement within Skyscanner that systems have monitoring and alerts configured against against agreed KPIs and machine level metrics. We use systems like Seyren, Graphite and VictorOps for monitoring and alerts. This means that we can quickly identify abnormal behavior, and ideally, pull the system using Configuration while any problem is rectified.

 

Exciting Times

image

Squads are already coming forward with some really interesting use-cases for the system, including some we’d not even thought of when dreaming it up, which is awesome. One such idea would see us able to serve part of our flights funnel while removing all of the servers which currently host that part of the system. Absolutely not the use case we’d imagined using Configuration as a service for, but really interesting none the less.

While it’s early days for Configuration as a service at Skyscanner, I’m really excited about where it’ll take us and the interesting ideas that people, who might not otherwise have had the ability to make changes, will bring to the table.

 

 


Sign up for email updates from the CodeVoyagers team


Hardcore! From Seed To Apple Watch App In Five Weeks

Posted on by Balint Orosz

applewatch

Back in May, we launched an Apple Watch app, which we had created in just five weeks. As any engineer will understand, those five weeks were somewhat stressful. A little caffeine-fuelled. Against-the-clock speed was needed, while at the same time we knew we couldn’t compromise on quality. No biggie, right?

Some of you might have unwrapped an Apple Watch as a gift over the festive season, so we figured this might be an ideal time to revisit quite how we went from a seed of an idea to a fully functioning app in five weeks.

A utility concept with no back-up data
Back then, the Apple Watch was a completely new product for iOS users. Therefore the first challenge we encountered was in shaping the utility concept with no back-up data to support it. Unable to refer to industry data on smart-watches, we instead scoped out an initial idea without any technical limitations, which actually turned out to be a refreshing way of working.

First we asked ourselves: what might a traveller need that can be provided by an ‘on the go’ technology that’s also in line with our existing Hotels app? A clear concept came to the fore: a ‘find your way back’ style app.

Say you go out for dinner, or simply a walk in a new city. Often, you can find yourself a little lost while trying to figure out how to get back to your hotel. We’ve all been there, and it’s all the more easy to become confused and disorientated when street signs are in a foreign language or a completely different script. Even if you’ve got access to online maps, map search still has limited functionality to find hotels, especially if you don’t know the exact name or address.

Therefore our idea was to create a simple, easy-to-use app for the new Apple Watch that helps travellers get back to their accommodation. We named it ‘Find Your Way’.

Five days of intensive research and design
The rapid pace required for the research and design in a short time-frame presented a fun (if exhausting) challenge, since the end result had to be a working product, not just a prototype. As such, continuous feasibility checks with our software developer teammates were vital, especially as the Watch was a completely new tool for them too.

Two things that shaped our journey:
• To sync the app with the Watch, we needed to build on its current capabilities, so we decided to go with the existing ‘Favourite’ feature
• Since we didn’t have an Apple Watch to hand, we relied on Apple’s well-defined, standard guidelines for the UI

In the first two days of the research and design sprint we explored the flow of the app on sketches and drawings, combined with on-going discussions with developers on the feasibility of our proposed features. Working so closely with our developers was one of the biggest learning outcomes of the whole process: we learned how to think with a developers’ perspective.

Day three: testing
Day three, and we had an initial design and even a tester. Of course, that also threw up a pretty basic but crucial conundrum: how on earth were we going to test the app without an actual Apple Watch device? So we went old-school. We printed the watch on a paper, got crafty with the scissors and put it on the wrists of our tester volunteers.

apple2

As the Watch app is about getting back to your hotel, we stepped out from the usual user testing methods and went to the streets, walking around and talking with our testers; how did they feel about the concept, what did they think about this particular aspect. Another added challenge — like us, our testers had never used the Apple Watch either, and most hadn’t ven set eyes on one. Usually, even if an app is new to someone, testers know how to start to explore it because they’ve handled the device (say, an iPhone) before — but of course, this wasn’t the case with the hotly-anticipated and closely guarded Apple Watch.

Our solution was to take testers through the two paper prototypes (which represented the two key screens of the app), and, given the constraints above, we also talked them through key features, rather than explaining what they could do on the screen or how they could control it.

Such limitations did make it tricky to agree on final learnings and take aways. However, plenty of UI variations later, our technical requirements were taking shape. The biggest area of debate surrounded readability, which is crucial on a device like the Apple Watch. Without the real thing to hand, we had to make do with mobile handsets, experimenting with how the designed content might look like on a small screen.

apple3

Finally: the real thing!
Drum-roll please: we were like kids in a candy store come the Apple Watch Lab event, where we could finally test the app for the first time, on a real, live device. We discarded our paper cut outs, consumed a frankly unhealthy amount of caffeine and energy drinks, and really got down to the nitty-gritty, rapidly making changes and amends until we had final approval.

From seed to Apple app in five weeks: it was whirlwind, but we’re delighted with the results (even if we do say so ourselves!). Our first users were from the US, UK, Germany, Australia and Japan. You can see the outcome yourself: we’ve created a guide to using the ‘Find Your Way’ feature here.


Sign up for email updates from the CodeVoyagers team


Are you a Polyglot Technologist?

Posted on by Richard Lennox

web programming concept

Originally posted on Feb 26, 2015 on medium.com at https://medium.com/@richardlennox/are-you-a-polyglot-technologist-fccd767bd421#.2jmjyoece

As software engineers and architects, we solve problems. We don’t just write code. The problems we solve are to improve the product that we offer our users. Effective system architecture is finding the right balance between those users’ objectives for the system, the technology applied to deliver the solution and the people building and operating the system. The necessary design decisions require a detachment from the technology we use. To that end we are less blinkered about our technology choices and must focus on the right approach to deliver the most compelling systems.

It is always interesting to hear general debates on technology or programming language choice flowing through wider communities, both here within Skyscanner and beyond in the industry as a whole. Some engineers are nervous and raise questions as to what it may mean to them as specialists as our technology continues to evolve. In my experience, this conversation happens many times over any given period, something we experience several times over our careers. Generally it is healthy.

This debate, though, is sometimes, if not often, seen as a competition between cool and uncool, modern versus old. The premise of that argument is wrong. The discussion shouldn’t be wasted on the technology or specific language of choice — these are simply the tools we use in the application of our solutions. As an engineer the discussion should primarily be about the fundamental engineering skills with the technology/language/framework relegated to a secondary consideration. In fact, most of us have been polyglot technologists for longer than we can remember, we simply don’t think about it in that way. We do all kinds of things with C#, Python or Java, with Rabbit MQ, MSMQ, Kafka or Couchbase, with SQL Server or MySQL, with Selenium, load balancing or CDN configurations. The list of the wide range of technologies we are exposed to and master is not simply one specific technology or language. We need to recognise that this takes a certain collection of skills. It means we are already polyglot technologists. We need to recognise it and use it to our advantage to effectively enable the delivery of our solutions.

From Monoglot to Polyglot

Prior to many years of focus on .Net technologies and C# particularly, I was a Java developer — doing part time support for a small application that was not doing an awful lot for anybody. I found a start-up web development company who was moving forwards with .Net 1.1 technology stack and had built a SaaS CMS on it. They were partnering with another start-up, a hotel gift voucher company, to build a SaaS e-commerce application for them and their hotel clients. Sharing the costs, the two companies were looking for a ‘cheap’ graduate in order to progress the development of that application. The opportunity the role brought about — the chance to have real influence as an early employee in not one, but two internet and web focussed start-ups — was exciting (and, many years later, that same decision process led me to joining Skyscanner). The Microsoft stack was the technical direction of that company. I had a few months of limited professional Java experience, so at the time the decision to re-train into .Net was a relatively easy one. I was also a recent graduate and had that graduate’s bulletproof belief in my unbeknown, complete lack of abilities, such that I didn’t get the new language fear you sometimes expect.

I bought a book (it was 10 years ago!) to get up to speed with the basics of ASP.Net before the interview process. I remember the interview exercise I had to do — a small application with contact address book of suppliers and some sales. I built it in Java with a Spring based UI. It was simple enough and while I did consider trying it in C# & ASP.net, I didn’t want to risk it. Luckily it was good enough. While training was laid on — videos, exercises, books etc. — I hit the application code to see if I could start to make sense of what was there and didn’t really look at the support materials again. The switch to C# from Java turned out to be a few minor syntax differences. I hadn’t intended to do anything serious but in that first week I found myself doing a bunch of small bug fixes. These went to production manually inside the first month and things snowballed from there. In that first month, I learned one of the most important software engineering and internet economy lessons that I try to apply every day:

This Internet Economy moves so fast, it’s impossible to keep up to date on everything. You are doing very well if you just keep moving forward. And it is forward motion that will lead to making a significant impact.

This first software engineering job was also my first foray into mastering a wide range of skills and being, what I now see as, a polyglot technologist. I was programming in C# mainly, but equally I was designing databases and developing complex SQL queries and stored procedures for sales reporting and forecasting. I was doing overall system design and architecture. I was coding Javascript, HTML*, CSS, XML based build scripts, DOS deployment scripts and doing web server administration, database server administration, traffic predicting and capacity planning. I started Unit testing and wrote some basic automated integration tests while designing the manual regression test scripts as well as basic deployment automation. All of which added to my baseline software development skill set.

It is these fundamental skills and the ability to keep moving them forward that I believe is at the core of any good software engineer. Good software creation follows tried and tested paradigms openly but not blindly. Object-orientation or functional programming, tiered architectures, SOLID principles, baked in quality, simple use of design patterns and more recent approaches like Continuous Delivery — these fundamentals have been around a long time. You can find them referenced in the seminal software engineering books such as Code Compete, Clean Code and The Pragmatic Programmer and none are technology specific. New patterns appear over time, but those that cross over the technology boundaries are the key foundations on which to base our decisions not the technology stack. Understanding how they are the key transferable skills between languages should give you the confidence to at least consider the step beyond your current platform of speciality.

While having the fundamentals skills is the first key to effective polyglot-ism, having the right working environment is also key. Working in an environment with multiple technologies allows you to constantly try something new. It gives you the freedom to leverage those skills and learn something you didn’t know yesterday. A software company that is so focussed on Ruby on Rails and nothing else is blinkered and always provides a rails-oriented solution. The team never gets the chance to exploit the alternative options despite the degree of chance that there is something more suited. Having a mind-set that supports adoption of the right tool for the right job — irrespective of underlying technology — enables better design and better solutions. The environment to make that choice has to exist. It can’t and should not be a free for all — for reasons of on-going support, maintenance and accidental costs — but a freedom to experiment should be the norm.

The language itself is always a secondary consideration. If you get the foundations right, you should have every confidence in your ability to pick up any new language or technology, understand its core principles and move on with it. Often at times it may be a different paradigm or a new approach (functional or parallelism paradigms) but at its core the fundamentals stand us in good stead for whatever is next. If as engineers we get this at the fundamental level, we can have the right approach to problem solving without constraints.

I am still most familiar with the .Net technology stack, the result of a decade of the environment and the language and seeing it change over time. It can become habitual. I am lucky enough that today there is a growing variety of choices I can make. I am more likely to try out a new technology or language; most recently GO, while Objective-C is next on my list.

I have always been a polyglot technologist. Haven’t you?

In this internet based economy every Software Engineer is a polyglot by necessity. We’re required to switch technologies efficiently, relying on our fundamental engineering skill-set to progress. That’s not to say we should all be generalists, specialists are also needed, but the core and fundamental skills of a specialist can be applied in many technologies. Since joining Skyscanner I continue to work with many different technology stacks. I have enhanced my Python skills. I have moved my Javascript skills forward. I understand more about web applications at scale — load balancing, CDN technology, cloud etc. I am constantly adapting and building on the core skillset. I am not just a .Net engineer. I never really have been. I have always been a polyglot technologist. Haven’t you?

* I can hear the scoffs at seeing HTML in this. Creating clean semantic HTML underpins web applications both for its usability when combined with high quality CSS and JS but also the art of Accessiblity, SEO and performance. It takes more than a little bit of craftsmanship too. If you would build a definition list with a bunch of <div> tags — then perhaps you should consider it further?


Sign up for email updates from the CodeVoyagers team