Measuring Autosuggest QualityPosted on by Ben Torfs
Greetings Code Voyagers, from the Free Text Search squad.
We power most of the Skyscanner auto-suggest search boxes, such as the ones where you select an airport where you want to fly to, or a city in which you need to find a hotel. More generally though, you could say that our mission is to map user input to the user intention, using as few keystrokes as possible.
Autosuggest: speed, relevancy and the ‘zero-result rate’
These search results need to appear very fast (less than 200ms, preferably), but above all, they need to be relevant. This especially true in the mobile market, where character typing can be a bit of a hassle and screen real estate is too scarce to display long lists of results.
Our current service is working well, and we are proud of the speed and accuracy of our results (even when the user includes some challenging typos). As always though, there is room for improvement, particularly in markets using non-Latin scripts. Measuring the quality of our service is tremendously important in identifying areas of improvement as well as enabling better A/B testing in the future.
Today, the most important metric we use is the rate of queries returning no results at all (the ‘zero-result rate’). At first it seems like an overly simplistic metric, but it can actually be quite useful to compare the performance between different locales, and how they evolve over time.
For instance, let’s take a look at this measure for the past six months in the UK, our longest supported market, where we’ve spent a lot of time optimizing the site. Our results are very strong, yet there are still very small amounts of user-made typing errors that we cannot recover from – for example, a user may be searching for a location that doesn’t have an airport, or attempt to search for a flight to ‘Frankfart’ rather than ‘Frankfurt’ (always amusing).
Auto-suggest and non-Latin scripts
It’s not quite so easy when optimizing for newer markets or Skyscanner, where non-Latin script is used. There are some great tools out there that have really helped us make fantastic improvements; in Japan, we’ve used the wonderful Kuromoji library to convert these queries into the various Japanese character types. We’ve made similar enhancements for other languages such as Korean, which again has resulted in real progress.
Alternative auto-suggest KPIs
The zero-result rate provides us with a good idea of where to steer our efforts, but it is pretty coarse and we are looking for new and better KPIs. Here are some of the ideas we came up with:
Surely this list is not complete. Do you have thoughts on this, or other ideas on how to measure and improve our auto-suggest results? Please let us know in the comments, because we would love to hear them.