Identifying Impactful Service System Problems via Log Analysis
Logs are often used for troubleshooting in large-scale software systems.
For a cloud-based online system that provides 24/7 service, a
huge number of logs could be generated every day. However, these
logs are highly imbalanced in general, because most logs indicate
normal system operations, and only a small percentage of logs
reveal impactful problems. Problems that lead to the decline of system
KPIs (Key Performance Indicators) are impactful and should be
fixed by engineers with a high priority. Furthermore, there are various
types of system problems, which are hard to be distinguished
manually. In this paper, we propose Log3C, a novel clustering-based
approach to promptly and precisely identify impactful system problems,
by utilizing both log sequences (a sequence of log events)
and system KPIs. More specifically, we design a novel cascading
clustering algorithm, which can greatly save the clustering time
while keeping high accuracy by iteratively sampling, clustering,
and matching log sequences. We then identify the impactful problems
by correlating the clusters of log sequences with system KPIs.
Log3C is evaluated on real-world log data collected from an online
service system at Microsoft, and the results confirm its effectiveness
and efficiency. Furthermore, our approach has been successfully
applied in industrial practice.
Tue 6 NovDisplayed time zone: Guadalajara, Mexico City, Monterrey change
10:30 - 12:00 | Log MiningJournal-First / Research Papers at Horizons 6-9F Chair(s): Dongyoon Lee Virginia Tech, USA | ||
10:30 22mTalk | Studying and Detecting Log-Related Issues Journal-First Mehran Hassani , Weiyi Shang Concordia University, Canada, Emad Shihab Concordia University, Nikolaos Tsantalis Concordia University, Canada DOI | ||
10:52 22mTalk | VT-Revolution: Interactive Programming Video Tutorial Authoring and Watching System Journal-First Lingfeng Bao Zhejiang University City College, Zhenchang Xing Australia National University, Xin Xia Monash University, David Lo Singapore Management University DOI | ||
11:15 22mTalk | Using Finite-State Models for Log Differencing Research Papers Hen Amar Tel Aviv University, Israel, Lingfeng Bao Zhejiang University City College, Nimrod Busany Tel Aviv University, Israel, David Lo Singapore Management University, Shahar Maoz Tel Aviv University | ||
11:37 22mTalk | Identifying Impactful Service System Problems via Log Analysis Research Papers Shilin He Chinese University of Hong Kong, Qingwei Lin Microsoft, China, Jian-Guang Lou Microsoft Research, Hongyu Zhang The University of Newcastle, Michael Lyu , Dongmei Zhang Microsoft Research, China |