NeuDep: Neural Binary Memory Dependence Analysis (ESEC/FSE 2022 - Research Papers)

Who

Kexin Pei, Dongdong She, Michael Wang, Scott Geng, Zhou Xuan, Yaniv David, Junfeng Yang, Suman Jana, Baishakhi Ray

Track

ESEC/FSE 2022 Research Papers

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 16 Nov 2022 11:00 - 11:15 at SRC LT 50 - Program Analysis II Chair(s): Marsha Chechik

Abstract

Determining whether multiple instructions can access the same memory location is a critical task in binary analysis. It is challenging as statically computing precise alias information is undecidable in theory. The problem aggravates at the binary level due to the presence of compiler optimizations and the absence of symbols and types. Existing approaches either produce significant spurious dependencies due to conservative analysis or scale poorly to complex binaries.

We present a new machine-learning-based approach to predict memory dependencies by exploiting the model's learned knowledge about how binary programs execute. Our approach features (i) a self-supervised procedure that pretrains a neural net to reason over binary code and its dynamic value flows through memory addresses, followed by (ii) supervised finetuning to infer the memory dependencies statically. To facilitate efficient learning, we develop dedicated neural architectures to encode the heterogeneous inputs (i.e., code, data values, and memory addresses from traces) with specific modules and fuse them with a composition learning strategy.

We implement our approach in NeuDep and evaluate it on 41 popular software projects compiled by 2 compilers, 4 optimizations, and 4 obfuscation passes. We demonstrate that NeuDep is more precise (1.5x) and faster (3.5x) than the current state-of-the-art. Extensive probing studies on security-critical reverse engineering tasks suggest that NeuDep understands memory access patterns, learns function signatures, and is able to match indirect calls. All these tasks either assist or benefit from inferring memory dependencies. Notably, NeuDep also outperforms the current state-of-the-art on these tasks.

DOI

https://doi.org/10.1145/3540250.3549147

Kexin Pei

Columbia University

United States

Dongdong She

Columbia University

United States

Michael Wang

Massachusetts Institute of Technology

United States

Scott Geng

Columbia University

United States

Zhou Xuan

Purdue University

United States

Yaniv David

Columbia University

United States

Junfeng Yang

Columbia University

United States

Suman Jana

Columbia University

United States

Baishakhi Ray

Columbia University

United States

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 16 Nov
Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

11:00 - 12:30	Program Analysis IIResearch Papers / Demonstrations / Ideas, Visions and Reflections at SRC LT 50 Chair(s): Marsha Chechik University of Toronto

11:00 15m Talk		NeuDep: Neural Binary Memory Dependence Analysis Research Papers Kexin Pei Columbia University, Dongdong She Columbia University, Michael Wang Massachusetts Institute of Technology, Scott Geng Columbia University, Zhou Xuan Purdue University, Yaniv David Columbia University, Junfeng Yang Columbia University, Suman Jana Columbia University, Baishakhi Ray Columbia University DOI
11:15 15m Talk		DynaPyt: A Dynamic Analysis Framework for Python Research Papers Aryaz Eghbali University of Stuttgart, Michael Pradel University of Stuttgart DOI Pre-print
11:30 15m Talk		Language-Agnostic Dynamic Analysis of Multilingual Code: Promises, Pitfalls, and Prospects Ideas, Visions and Reflections Haoran Yang Washington State University, Wen Li Washington State University, Haipeng Cai Washington State University DOI
11:45 15m Talk		Cross-Language Android Permission Specification Research Papers Chaoran Li Swinburne University of Technology, Xiao Chen Monash University, Ruoxi Sun The University of Adelaide, Minhui (Jason) Xue University of Adelaide, Sheng Wen Swinburne University of Technology, Muhammad Ejaz Ahmed Data61, CSIRO, Seyit Camtepe CSIRO Data61, Yang Xiang Digital Research & Innovation Capability Platform, Swinburne University of Technology DOI
12:00 15m Talk		Peahen: Fast and Precise Static Deadlock Detection via Context Reduction Research Papers Yuandao Cai Hong Kong University of Science and Technology, Chengfeng Ye Hong Kong University of Science and Technology, Qingkai Shi Purdue University, Charles Zhang Hong Kong University of Science and Technology DOI
12:15 7m Talk		FIM: Fault Injection and Mutation for Simulink Demonstrations Ezio Bartocci TU Wien, Leonardo Mariani University of Milano-Bicocca, Dejan Nickovic Austrian Institute of Technology, Drishti Yadav Technische Universität Wien
12:23 7m Talk		JSIMutate: Understanding Performance Results through Mutations Demonstrations Thomas Laurent Lero & University College Dublin, Paolo Arcaini National Institute of Informatics , Catia Trubiani Gran Sasso Science Institute, Anthony Ventresque University College Dublin & Lero, Ireland DOI Media Attached