An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems
In large-scale distributed systems, node crashes are inevitable, and can happen at any time. As such, distributed systems are usually designed to be resilient to these node crashes via vari-ous crash recovery mechanisms, such as write-ahead logging in HBase and hinted handoffs in Cassandra. However, faults in crash recovery mechanisms and their implementations can in-troduce intricate crash recovery bugs, and lead to severe conse-quences.
In this paper, we present CREB, the most comprehensive study on 103 Crash REcovery Bugs from four popular open-source distributed systems, including ZooKeeper, Hadoop MapReduce, Cassandra and HBase. For all the studied bugs, we analyze their root causes, triggering conditions, bug impacts and fixing. Through this study, we obtain many interesting find-ings that can open up new research directions for combating crash recovery bugs.
State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing
Thu 8 Nov
|13:30 - 13:52|
Meng Yan, Xin XiaMonash University, Emad ShihabConcordia University, David LoSingapore Management University, Jianwei Yin, Xiaohu YangDOI
|13:52 - 14:15|
|14:15 - 14:37|
Yu GaoInstitute of Software, Chinese Academy of Sciences, Wensheng DouInstitute of Software, Chinese Academy of Sciences, Feng QinOhio State University, USA, Chushu GaoInstitute of Software, Chinese Academy of Sciences, Dong WangInstitute of Software at Chinese Academy of Sciences, China, Jun WeiState Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, Ruirui HuangAlibaba Group, China, Li ZhouAlibaba Group, China, Yongming WuAlibaba Group, China
|14:37 - 15:00|
Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks