AutoPruner: Transformer-Based Call Graph Pruning (ESEC/FSE 2022 - Research Papers)

Who

Le-Cong Thanh, Hong Jin Kang, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan-Bach D. Le, Quyet Thang Huynh

Track

ESEC/FSE 2022 Research Papers

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 15 Nov 2022 14:00 - 14:15 at SRC Auditorium 2 - Machine Learning III Chair(s): Xi Zheng

Abstract

Constructing a static call graph requires trade-offs between soundness and precision.
Program analysis techniques for constructing call graphs are unfortunately usually imprecise.
To address this problem, researchers have recently proposed call graph pruning empowered by machine learning to post-process call graphs constructed by static analysis. A machine learning model is built to capture information from the call graph by extracting structural features for use in a random forest classifier. It then removes edges that are predicted to be false positives. Despite the improvements shown by machine learning models, they are still limited as they do not consider the source code semantics and thus often are not able to effectively distinguish true and false positives.

In this paper, we present a novel call graph pruning technique, AutoPruner, for eliminating false positives in call graphs via both statistical semantic and structural analysis.
Given a call graph constructed by traditional static analysis tools, AutoPruner takes a Transformer-based approach to capture the semantic relationships between the caller and callee functions associated with each edge in the call graph. To do so, AutoPruner fine-tunes a model of code that was pre-trained on a large corpus to represent source code based on descriptions of its semantics.
Next, the model is used to extract semantic features from the functions related to each edge in the call graph. AutoPruner uses these semantic features together with the structural features extracted from the call graph to classify each edge via a feed-forward neural network. Our empirical evaluation on a benchmark dataset of real-world programs shows that AutoPruner outperforms the state-of-the-art baselines, improving on F-measure by up to 13% in identifying false-positive edges in a static call graph. Moreover, AutoPruner achieves improvements on two client analyses, including halving the false alarm rate on null pointer analysis and over 10% improvements on monomorphic call-site detection. Additionally, our ablation study and qualitative analysis show that the semantic features extracted by AutoPruner capture a remarkable amount of information for distinguishing between true and false positives.

Link to Preprint

https://arxiv.org/pdf/2209.03230.pdf

DOI

https://doi.org/10.1145/3540250.3549175

Le-Cong Thanh

Singapore Management University

Singapore

Hong Jin Kang

Singapore Management University

Singapore

Truong Giang Nguyen

Singapore Management University

Singapore

Stefanus Agus Haryono

Singapore Management University

Singapore

David Lo

Singapore Management University

Singapore

Xuan-Bach D. Le

University of Melbourne

Australia

Quyet Thang Huynh

Hanoi University of Science and Technology

Vietnam

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 15 Nov
Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

14:00 - 15:30	Machine Learning IIIResearch Papers / Ideas, Visions and Reflections at SRC Auditorium 2 Chair(s): Xi Zheng Macquarie University

14:00 15m Talk		AutoPruner: Transformer-Based Call Graph Pruning Research Papers Le-Cong Thanh Singapore Management University, Hong Jin Kang Singapore Management University, Truong Giang Nguyen Singapore Management University, Stefanus Agus Haryono Singapore Management University, David Lo Singapore Management University, Xuan-Bach D. Le University of Melbourne, Quyet Thang Huynh Hanoi University of Science and Technology DOI Pre-print
14:15 15m Talk		Exploring the Under-Explored Terrain of Non-open Source Data for Software Engineering through the Lens of Federated Learning Ideas, Visions and Reflections Shriram Shanbhag IIT Tirupati, Sridhar Chimalakonda IIT Tirupati DOI Pre-print
14:30 15m Talk		CORMS: A GitHub and Gerrit Based Hybrid Code Reviewer Recommendation Approach for Modern Code Review Research Papers Pandya Prahar Hemantkumar DA-IICT Gandhinagar, Saurabh Tiwari DA-IICT Gandhinagar DOI
14:45 15m Full-paper		Hierarchical Bayesian Multi-kernel Learning for Integrated Classification and Summarization of App Reviews Research Papers Moayad Alshangiti University of Jeddah; Rochester Institute of Technology, Weishi Shi Rochester Institute of Technology, Eduardo Coelho de Lima Rochester Institute of Technology, Xumin Liu Rochester Institute of Technology, Qi Yu Rochester Institute of Technology DOI