Deflaking flaky tests

Flaky automated tests are a real drag. They’re worse than no tests at all. That being said, you can take measures to eradicate them.

Where I work folks have (finally) become test-infected, which means non-dedicated testers are writing unit, and even functional tests for themselves. Because of their lack of testing experience, a lot of these testers write tests, run them once, and bask in the green. The test may happen to pass the first time it runs, then he/she commits it to master and all’s well, right?

Wrong. For example, any time the application under test exhibits asynchronous behavior, we are likely to suffer indeterminate test results. These kinds of results are costly and time consuming to debug. Let’s say your test fails 5% of the time, but each test-run takes five minutes. That’s a lot of time typing commands at the terminal and waiting.

That’s why as a web developer at NCBI, I recently open-sourced a simple, but very useful, python package called deflake. To install the package do:

$ pip install deflake

You then get a command called deflake on your path that you can use to run a program (like a suite of tests) until the program returns a non-zero exit status. In certain circumstances you can increase the pool-size (default is 1) to multi-process to speed up the debugging. You can also change the max-runs (default is 10) for programs that rarely fail. Here’s an example deflake run using the default options:

$ deflake "python myprogram arg1 arg2"
PASS
PASS
PASS
PASS
PASS
PASS
FAIL (run 7)
$

You can even import the Deflake class into another python program and run it programmatically. See the README for more details.

I hope the effort I put into this will help you eradicate the scourge of flaky tests. And remember this package can help you debug any flaky program, not just flaky tests. Contact me if you find any issues or have ideas for improvements. Or better yet, issue a pull request on github.

Leave a Reply

Your email address will not be published. Required fields are marked *