Improving Regression Testing in Continuous Integration Development Environments (Keynote)
In continuous integration development environments, software engineers frequently integrate new or changed code with the mainline codebase. Merged code is then regression tested to help ensure that the codebase remains stable and that continuing engineering efforts can be performed more reliably. Continuous integration is advantageous because it can reduce the amount of code rework that is needed in later phases of development, and speed up overall development time. From a testing standpoint, however, continuous integration raises several challenges.
Chief among these challenges are the costs, in terms and time and resources, associated with handling a constant flow of requests to execute tests. To help with this, organizations often utilize farms of servers to run tests in parallel, or execute tests "in the cloud", but even then, test suites tend to expand to utilize all available resources, and then continue to expand beyond that.
We have been investigating strategies for applying regression testing in continuous integration development environments more cost-effectively. Our strategies are based on two well-researched techniques for improving the cost-effectiveness of regression testing – regression test selection (RTS) and test case prioritization (TCP). In the continuous integration context, however, traditional RTS and TCP techniques are difficult to apply, because these techniques rely on instrumentation and analyses that cannot easily be applied to fast-arriving streams of test suites.
We have thus created new forms of RTS and TCP techniques that utilize relatively lightweight analyses, that can cope with the volume of test requests. To evaluate our techniques, we have conducted an empirical study on several large data sets. In this talk, I describe our techniques and the empirical results we have obtained in studying them.