Wed 7 Nov 2018 14:15 - 14:37 at Horizons 10-11 - Software Maintenance I Chair(s): Christian Bird

Issue tracking data have been used extensively to aid in predicting or recommending software development practices. Issue attributes typically change over time, but users may use data from a separate time of data collection rather than the time of their application scenarios. We, therefore, investigate data leakage, which results from ignoring the chronological order in which the data were produced. Information leaked from the "future" makes prediction models misleadingly optimistic. We examine existing literature to confirm the existence of data leakage and reproduce three typical studies (detecting duplicate issues, localizing issues, and predicting issue-fix time) adjusted for appropriate data to quantify the impact of the data leakage. We confirm that 11 out of 58 studies have leakage problem, while 44 are suspected. We observe biased results caused by data leakage while the extent is not striking. Attributes of summary, component, and assignee have the largest impact on the results. Our findings suggest that data users are often unaware of the context of the data being produced. We recommend researchers and practitioners who attempt to utilize issue tracking data to address software development problems to have a full understanding of their application scenarios, the origin and change of the data, and the influential issue attributes to manage data leakage risks.

Wed 7 Nov

Displayed time zone: Guadalajara, Mexico City, Monterrey change

13:30 - 15:00
Software Maintenance IResearch Papers / Journal-First at Horizons 10-11
Chair(s): Christian Bird Microsoft Research
13:30
22m
Talk
Use and Misuse of Continuous Integration Features: An Empirical Study of Projects that (mis)use Travis CI
Journal-First
Keheliya Gallaba McGill University, Shane McIntosh McGill University
DOI
13:52
22m
Research paper
One Size Does Not Fit All: An Empirical Study of Containerized Continuous Deployment Workflows
Research Papers
Yang Zhang National University of Defense Technology, China, Bogdan Vasilescu Carnegie Mellon University, Huaimin Wang , Vladimir Filkov University of California at Davis, USA
Pre-print
14:15
22m
Talk
Be Careful of When: An Empirical Study on Time-Related Misuse of Issue Tracking Data
Research Papers
Feifei Tu Peking University, China, Jiaxin Zhu Institute of Software at Chinese Academy of Sciences, China, Qimu Zheng Peking University, China, Minghui Zhou Peking University
14:37
22m
Talk
Do the Dependency Conflicts in My Project Matter?
Research Papers
Ying Wang Northeastern University, China, Ming Wen The Hong Kong University of Science and Technology, Zhenwei Liu Northeastern University, China, Rongxin Wu Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Rui Wang Northeastern University, China, Bo Yang Northeastern University, China, Hai Yu Northeastern University, China, Zhiliang Zhu Northeastern University, China, Shing-Chi Cheung Department of Computer Science and Engineering, The Hong Kong University of Science and Technology