Early prediction of merged code changes to prioritize reviewing tasks (ESEC/FSE 2018 - Journal-First)

Sun 4 - Fri 9 November 2018 Lake Buena Vista, Florida, United States

Who

Yuanrui Fan, Xin Xia, David Lo, Shanping Li

Track

ESEC/FSE 2018 Journal-First

Time Zone

The program is currently displayed in (GMT-05:00) Cancun.

Use conference time zone: (GMT-05:00) CancunSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 8 Nov 2018 10:30 - 10:52 at Horizons 5 - Estimation and Prediction Chair(s): Jim Herbsleb

Abstract

Modern Code Review (MCR) has been widely used by open source and proprietary software projects. Inspecting code changes consumes reviewers much time and effort since they need to comprehend patches, and many reviewers are often assigned to review many code changes. Note that a code change might be eventually abandoned, which causes waste of time and effort. Thus, a tool that predicts early on whether a code change will be merged can help developers prioritize changes to inspect, accomplish more things given tight schedule, and not waste reviewing effort on low quality changes. In this paper, motivated by the above needs, we build a merged code change prediction tool. Our approach first extracts 34 features from code changes, which are grouped into 5 dimensions: code, file history, owner experience, collaboration network, and text. And then we leverage machine learning techniques such as random forest to build a prediction model. To evaluate the performance of our approach, we conduct experiments on three open source projects (i.e., Eclipse, LibreOffice, and OpenStack), containing a total of 166,215 code changes. Across three datasets, our approach statistically significantly improves random guess classifiers and two prediction models proposed by Jeong et al. (2009) and Gousios et al. (2014) in terms of several evaluation metrics. Besides, we also study the important features which distinguish merged code changes from abandoned ones.

DOI

https://doi.org/10.1007/s10664-018-9602-0

Yuanrui Fan

Xin Xia

Monash University

Australia

David Lo

Singapore Management University

Singapore

Shanping Li