Dixie: turning chaos to your advantagePosted on by Balint Orosz
What do you do, if:
… your app crashes with unreliable networks?
… you want to make sure your app can withstand edge-cases?
We’ve got a simple formula: create a chaos generator to simulate worst-case scenarios, attempt to break your app and try to cause it to fail. And if you’re successful? Congratulations: modify your code, increase your app’s fault tolerance, and repeat.
We call it Dixie. It’s an open-source project to help developers find an effective answer to stability issues and worst-case scenarios, and we’ve shared it on GitHub.
Interested? Here’s how Dixie came to life.
We all know that today’s development teams have to create increasingly complex software in reduced time-frames. It’s no different here at Skyscanner, where we create mobile apps across multiple platforms. We, like many of you, believe that being able to react immediately to constantly changing requirements without sacrificing the perfect user experience is a key element in the product development cycle.
As our mobile app team continues to expand, it’s ever-more difficult for one developer to understand the entire codebase and visualise the impact a new modification could cause. With this comes the risk of unexpected side effects and mysterious crashes. We recognized that what we really needed to do was build apps that can handle any unexpected situation, even those developers don’t tend to expect during the initial creation process.
We came up with a few possible solutions. A high level of unit test coverage (approaching 100%) on a code base wasn’t a bad shout, but with a side-effect of greatly reduced reaction time, there was too much potential damage to the development cycle (plus, 100% code coverage can, in some cases, be almost impossible to achieve). We also considered identifying the most critical parts of the code and testing these incredibly thoroughly, which has the up-side of adding huge value to development efforts. However, the final solution came from Peter Adam Wiesner (lead iOS developer in Skyscanner) who was inspired by an article about something called Chaos Monkey, created by Netflix backend developers.
Not familiar with Chaos Monkey?
Chaos Monkey is a service which identifies groups of systems and randomly terminates one of the systems in a group.
Basically, Chaos Monkey allows developers to attack their own system to an extent which helps highlight potential weaknesses. Knowing about, and reacting to, these weaknesses helps increase long-term quality and builds confidence in the stability of the system.
Dixie: the Solution
We thought a tool like this could be just as useful for our existing projects. However, rather than a system of servers, our tool targets code components and modifies their behaviour.
Consider first that an application written in an object-oriented language can be visualized as similar to a network of servers communicating with each other. Just as a server can go down, a component of code can also start behaving incorrectly, which can then affect all the other components which are reliant upon it. This component could have all kinds of responsibilities, such as handling network communication, providing location information, managing user data or loading files from the file system.
A generally acceptable result in this scenario would be for the system to degrade gracefully, and recover from the error while minimizing the amount of harm to user experience. Ideally, the system would not continue to propagate errors, and should certainly not crash completely.
This is where Dixie comes in. Like Chaos Monkey, it can be thought of as a chaos generator, which can cause specified components to function differently and help simulate worst-case scenarios. A developer using this tool can deliberately attempt to break the app and cause it to fail. If they are successful, and the app does not handle the breakage gracefully, then it is a clear sign to the developer that the code requires modifications to increase its fault tolerance.
The idea of changing an objects behaviour is not new; developers are already using mocking libraries in unit tests. These libraries help to gain control over the dependencies of the tested components. Most of the libraries focusing on mocking instances, therefore they require the target component to use its dependencies as injected objects (ie: provide interface where they can be set or be ready to be used with IoC libraries). A well-designed architecture supports all of these above, although the testing of application with higher complexity can still be a problem. Writing higher abstractions of unit tests (integration, systems), requires more and more work to assemble the correct environment.
Instead, Dixie takes a different approach, by allowing changes to the behaviour of interior components. By applying some chaos in the method of some objects, the program flow can be changed to present (edge) cases and allows for them to be distributed across multiple components, testing their robustness. For a concrete implementation we choose the Objective-C language, where replacing behaviours is easier due to its runtime. Instead of using NSProxy object (which would also require injectable dependencies), we choose to work with the technique of method swizzling.
Method swizzling is based on calling the correct run time APIs to replace a method’s implementation with a new implementation. Working with this API requires the developer to be familiar with low-level C methods, object representations and provide the correct method environment information. Dixie takes care all of these and hides the low level logics, so the developers can focus on creating new configurations.
The developer can specify the objects and the methods that should be changed and chooses how they wish them to be changed. This creates the ‘profile’. This profile can then be applied in a single line of code, which will cause Dixie to rewire the internal structure of the application. Changes can be reverted at any time, which gives developers a control over how and where they choose to apply Dixie.
The possibilities are limitless; Dixie allows you to create your own tools, from the simplest custom profiles to complex patterns of behaviours. We’ve created an example app to demonstrate how easily a developer might implement chaos:
• Altering the GPS coordinates returned by the operating system (Location example app)
• Altering dates or times returned by the operating system (Date example app)
• Changing network response (Network example app)
Or, why not use Dixie to:
• Replace localization strings to test label truncations without polluting production code
• Simulate properties in your data objects
• Change operation system informations like battery level and device hardware information
The first version of Dixie was implemented back in October 2014, with a second version released this summer by Skyscanner’s Budapest team (Peter Adam Wiesner, Zsolt Varnai, Phillip Wheatley, Tamas Flamich, Zsombor Fuszenecker, Csaba Szabo). What’s different? Well, we’ve cut unstable proof of concept parts from the codebase, in addition to going through every source file to refactor and clean them, making them more usable for the community.
As we focused on the essentials, this means that Dixie currently only supports replacement of methods that expect objects as parameters and either return object or void. In the future, we want to add support for primitive types too. There is plenty to implement both horizontally (extending the current tool) and vertically (implementing new tools) in the long term.
Here’s what might be next:
• implementing a unit test tool that can do fuzzing on the specific method of a class (or all), detecting input assertions and creating unit test for the failed cases
• undertaking code analysis to find weak spots in method and automatically suggest behaviour changes
• detecting application dependency graph runtime and using this information to create more efficient chaos
We hope that Dixie can help in solving complex issues in a much more productive and effective way. You can find Dixie here— let us know what you think in the comments below, and get involved over at GitHub.