AutoTSG: Learning and Synthesis for Incident Troubleshooting (ESEC/FSE 2022 - Industry Paper)

Mon 14 - Fri 18 November 2022 Singapore

Who

Manish Shetty, Chetan Bansal, Sai Pramod Upadhyayula, Arjun Radhakrishna, Anurag Gupta

Track

ESEC/FSE 2022 Industry Paper

Abstract

Incident management is a key aspect of operating large-scale cloud services. To aid with faster and efficient resolution of incidents, engineering teams document frequent troubleshooting steps in the form of Troubleshooting Guides (TSGs), to be used by on-call engineers (OCEs). However, TSGs are siloed, unstructured, and often incomplete, requiring developers to manually understand and execute necessary steps. This results in a plethora of issues such as on-call fatigue, reduced productivity, and human errors. In this work, we conduct a large-scale empirical study of over 4K+ TSGs mapped to incidents and find that TSGs are widely used and help significantly reduce mitigation efforts. We then analyze feedback on TSGs provided by 400+ OCEs and propose a taxonomy of issues that highlights significant gaps in TSG quality. To alleviate these gaps, we investigate the automation of TSGs and propose AutoTSG – a novel framework for automation of TSGs to executable workflows by combining machine learning and program synthesis. Our evaluation of AutoTSG on 50 TSGs shows the effectiveness in both identifying TSG statements (accuracy 0.89) and parsing them for execution (precision 0.94 and recall 0.91). Lastly, we survey ten Microsoft engineers and show the importance of TSG automation and the usefulness of AutoTSG.

Link to Preprint

https://arxiv.org/pdf/2205.13457.pdf

DOI

https://doi.org/10.1145/3540250.3558958

Manish Shetty

UC Berkeley

AutoTSG: Learning and Synthesis for Incident Troubleshooting

Manish Shetty

UC Berkeley

India

Chetan Bansal

Microsoft

United States

Sai Pramod Upadhyayula

Microsoft

United States

Arjun Radhakrishna

Microsoft

United States

Anurag Gupta

Microsoft

United States

Tracks

Co-hosted Conferences

Workshops

Co-hosted Symposia