TraceCRL: Contrastive Representation Learning for Microservice Trace Analysis
Due to the large amount and high complexity of trace data, microservice trace analysis tasks such as anomaly detection, fault diagnosis, and tail-based sampling widely adopt machine learning technology. These trace analysis approaches usually use a preprocessing step to map structured features of traces to vector representations in an ad-hoc way. Therefore, they may lose important information such as topological dependencies between service operations. In this paper, we propose TraceCRL, a trace representation learning approach based on contrastive learning and graph neural network, which can incorporate graph structured information in the downstream trace analysis tasks. Given a trace, TraceCRL constructs an operation invocation graph where nodes represent service operations and edges represent operation invocations together with predefined features for invocation status and related metrics. Based on the operation invocation graphs of traces TraceCRL uses a contrastive learning method to train a graph neural network-based model for trace representation. In particular, TraceCRL employs six trace data augmentation strategies to alleviate the problems of class collision and uniformity of representation in contrastive learning. Our experimental studies show that TraceCRL can significantly improve the performance of trace anomaly detection and offline trace sampling. It also confirms the effectiveness of the trace augmentation strategies and the efficiency of TraceCRL.