Determining whether multiple instructions can access the same memory location is a critical task in binary analysis. It is challenging as statically computing precise alias information is undecidable in theory. The problem aggravates at the binary level due to the presence of compiler optimizations and the absence of symbols and types. Existing approaches either produce significant spurious dependencies due to conservative analysis or scale poorly to complex binaries.
We present a new machine-learning-based approach to predict memory dependencies by exploiting the model's learned knowledge about how binary programs execute. Our approach features (i) a self-supervised procedure that pretrains a neural net to reason over binary code and its dynamic value flows through memory addresses, followed by (ii) supervised finetuning to infer the memory dependencies statically. To facilitate efficient learning, we develop dedicated neural architectures to encode the heterogeneous inputs (i.e., code, data values, and memory addresses from traces) with specific modules and fuse them with a composition learning strategy.
We implement our approach in NeuDep and evaluate it on 41 popular software projects compiled by 2 compilers, 4 optimizations, and 4 obfuscation passes. We demonstrate that NeuDep is more precise (1.5x) and faster (3.5x) than the current state-of-the-art. Extensive probing studies on security-critical reverse engineering tasks suggest that NeuDep understands memory access patterns, learns function signatures, and is able to match indirect calls. All these tasks either assist or benefit from inferring memory dependencies. Notably, NeuDep also outperforms the current state-of-the-art on these tasks.
Wed 16 NovDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
11:00 - 12:30 | Program Analysis IIResearch Papers / Demonstrations / Ideas, Visions and Reflections at SRC LT 50 Chair(s): Marsha Chechik University of Toronto | ||
11:00 15mTalk | NeuDep: Neural Binary Memory Dependence Analysis Research Papers Kexin Pei Columbia University, Dongdong She Columbia University, Michael Wang Massachusetts Institute of Technology, Scott Geng Columbia University, Zhou Xuan Purdue University, Yaniv David Columbia University, Junfeng Yang Columbia University, Suman Jana Columbia University, Baishakhi Ray Columbia University DOI | ||
11:15 15mTalk | DynaPyt: A Dynamic Analysis Framework for Python Research Papers DOI Pre-print | ||
11:30 15mTalk | Language-Agnostic Dynamic Analysis of Multilingual Code: Promises, Pitfalls, and Prospects Ideas, Visions and Reflections Haoran Yang Washington State University, Wen Li Washington State University, Haipeng Cai Washington State University DOI | ||
11:45 15mTalk | Cross-Language Android Permission Specification Research Papers Chaoran Li Swinburne University of Technology, Xiao Chen Monash University, Ruoxi Sun The University of Adelaide, Minhui (Jason) Xue University of Adelaide, Sheng Wen Swinburne University of Technology, Muhammad Ejaz Ahmed Data61, CSIRO, Seyit Camtepe CSIRO Data61, Yang Xiang Digital Research & Innovation Capability Platform, Swinburne University of Technology DOI | ||
12:00 15mTalk | Peahen: Fast and Precise Static Deadlock Detection via Context Reduction Research Papers Yuandao Cai Hong Kong University of Science and Technology, Chengfeng Ye Hong Kong University of Science and Technology, Qingkai Shi Purdue University, Charles Zhang Hong Kong University of Science and Technology DOI | ||
12:15 7mTalk | FIM: Fault Injection and Mutation for Simulink Demonstrations Ezio Bartocci TU Wien, Leonardo Mariani University of Milano-Bicocca, Dejan Nickovic Austrian Institute of Technology, Drishti Yadav Technische Universität Wien | ||
12:23 7mTalk | JSIMutate: Understanding Performance Results through Mutations Demonstrations Thomas Laurent Lero & University College Dublin, Paolo Arcaini National Institute of Informatics
, Catia Trubiani Gran Sasso Science Institute, Anthony Ventresque University College Dublin & Lero, Ireland DOI Media Attached |