Flaky tests - repeated tests on software projects that can both pass and fail despite no changes having been made to the code - can cost developers significant amounts of time and energy through repeated testing. As such, identifying and repairing flaky tests can be costly.
Researchers Owain Parry and Phil McMinn aim to produce techniques to identify, debug and repair flaky tests that could ultimately be implemented as automated tools for developers. These tools could save developers time, money and energy.
During the development of a software project, the code needs to be tested throughout to make sure the application does what it鈥檚 designed to do. The software should either work properly every time and pass the test, or not work correctly and fail the test. Generally, if the software fails the test, it means there is a bug somewhere in the program that needs to be fixed. However, in reality, this is not always the case. Seemingly at random, the same test of the same code will produce different results, causing them to be labelled as 鈥渇laky鈥� tests, and hence, unreliable.
It is often assumed that failing tests are due to bugs in the program, but in reality this could be down to an issue with the test itself. On face value, there's no way to distinguish between the two.
鈥淭his means a great deal of time and energy can be spent looking for a bug that isn鈥檛 there.鈥�
Owain Parry
PhD student and member of the Department鈥檚 Testing Research Group
Running these tests over and over to detect flaky tests, particularly in larger projects, can take impractical amounts of time and resources, meaning the bug or issue with the test itself often goes undetected. This can lead to problems with software much further down the line.
Using a machine learning model, the team is developing a tool which can predict, with a reasonable degree of accuracy, whether a test is flaky or not. This dramatically reduces the number of times a test needs to be run, saving time, money and energy. Moving forward, the team is hoping to create a tool that will not only detect flaky tests, but identify the root cause.
鈥淭here is a danger that if you give a developer a list of flaky tests, they will mark them in some way so they don鈥檛 fail the software build, with the intention of fixing them later,鈥� added Owain.
鈥淭hen you end up with fewer tests, which means it鈥檚 more likely bugs will be missed - and this can have significant impact.
鈥淚f we can build something that鈥檚 relatively straightforward to set up and use, and it can give developers more information than whether a test is flaky or not, that can ultimately lead to better software.
鈥淲hat we鈥檙e doing may not look very glamorous, but it could have a significant benefit to developers and translate into real world impact.鈥�