Be Careful of When: An Empirical Study on Time-Related Misuse of Issue Tracking Data
Issue tracking data have been used extensively to aid in predicting or recommending software development practices. Issue attributes typically change over time, but users may use data from a separate time of data collection rather than the time of their application scenarios. We, therefore, investigate data leakage, which results from ignoring the chronological order in which the data were produced. Information leaked from the "future" makes prediction models misleadingly optimistic. We examine existing literature to confirm the existence of data leakage and reproduce three typical studies (detecting duplicate issues, localizing issues, and predicting issue-fix time) adjusted for appropriate data to quantify the impact of the data leakage. We confirm that 11 out of 58 studies have leakage problem, while 44 are suspected. We observe biased results caused by data leakage while the extent is not striking. Attributes of summary, component, and assignee have the largest impact on the results. Our findings suggest that data users are often unaware of the context of the data being produced. We recommend researchers and practitioners who attempt to utilize issue tracking data to address software development problems to have a full understanding of their application scenarios, the origin and change of the data, and the influential issue attributes to manage data leakage risks.
Wed 7 NovDisplayed time zone: Guadalajara, Mexico City, Monterrey change
13:30 - 15:00 | Software Maintenance IResearch Papers / Journal-First at Horizons 10-11 Chair(s): Christian Bird Microsoft Research | ||
13:30 22mTalk | Use and Misuse of Continuous Integration Features: An Empirical Study of Projects that (mis)use Travis CI Journal-First DOI | ||
13:52 22mResearch paper | One Size Does Not Fit All: An Empirical Study of Containerized Continuous Deployment Workflows Research Papers Yang Zhang National University of Defense Technology, China, Bogdan Vasilescu Carnegie Mellon University, Huaimin Wang , Vladimir Filkov University of California at Davis, USA Pre-print | ||
14:15 22mTalk | Be Careful of When: An Empirical Study on Time-Related Misuse of Issue Tracking Data Research Papers Feifei Tu Peking University, China, Jiaxin Zhu Institute of Software at Chinese Academy of Sciences, China, Qimu Zheng Peking University, China, Minghui Zhou Peking University | ||
14:37 22mTalk | Do the Dependency Conflicts in My Project Matter? Research Papers Ying Wang Northeastern University, China, Ming Wen The Hong Kong University of Science and Technology, Zhenwei Liu Northeastern University, China, Rongxin Wu Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Rui Wang Northeastern University, China, Bo Yang Northeastern University, China, Hai Yu Northeastern University, China, Zhiliang Zhu Northeastern University, China, Shing-Chi Cheung Department of Computer Science and Engineering, The Hong Kong University of Science and Technology |