FlakeRepro: Automated and Efficient Reproduction of Concurrency-Related Flaky Tests
Flaky tests, which can non-deterministically pass or fail on the same code,
impose significant burden on developers by providing misleading signals during
regression testing. Microsoft developers consider flaky tests as one of the top
two reasons for slowing down software development. In order to debug the
root-cause of a flaky behavior, a developer often needs to first reliably
reproduce a failed execution. Unfortunately, this is non-trivial. For example,
most of the flakiness in unit tests are caused by concurrency, and reproducing
their failures requires specific thread interleaving. To address this
challenge, we introduce FlakeRepro that helps developers reproduce a failed
execution of a flaky test caused by concurrency. FlakeRepro combines static
and dynamic analysis to quickly identify an interleaving that makes a flaky
test fail with the same original error message. FlakeRepro is efficient: it
can reproduce a failed execution after exploring few tens of interleavings.
FlakeRepro integrates well with existing systems: it automatically instruments
test binaries that can run on existing and unmodified test pipelines.
We have implemented FlakeRepro for .NET and used it in Microsoft. In an
experiment with 22 Microsoft projects, FlakeRepro could reproduce 26 of total
31 concurrent-related flaky tests, after exploring only $<7$ interleavings and
within 6 minutes on average.