How to Better Utilize Code Graphs in Semantic Code Search?
Semantic code search greatly facilitates software reuse, which enables users to find code snippets highly matching user-specified natural language queries. Due to the rich expressive power of code graphs (e.g., control-flow graph and program dependency graph), both of the two mainstream research works (i.e., multi-modal models and pre-trained models) have attempted to incorporate code graphs for code modelling. However, they still have some limitations: First, there is still much room for improvement in terms of search effectiveness. Second, they have not fully considered the unique features of code graphs.
In this paper, we propose a Graph-to-Sequence Converter, namely $G2SC$. Through converting the code graphs into lossless sequences, $G2SC$ enables to address the problem of small graph learning using sequence feature learning and capture both the edges and nodes attribute information of code graphs. Thus, the effectiveness of code search can be greatly improved. In particular, $G2SC$ first converts the code graph into a unique corresponding node sequence by a specific graph traversal strategy. Then, it gets a statement sequence by replacing each node with its corresponding statement. A set of carefully designed graph traversal strategies guarantee that the process is one-to-one and reversible. $G2SC$ enables capturing rich semantic relationships (i.e., control flow, data flow, node/relationship properties) and provides learning model-friendly data transformation. It can be flexibly integrated with existing models to better utilize the code graphs. As a proof-of-concept application, we present two $G2SC$ enabled models: \textit{GSMM} ($G2SC$ enabled multi-modal model) and \textit{GSCodeBERT} ($G2SC$ enabled \textit{CodeBERT} model). Extensive experiment results on two real large-scale datasets demonstrate that \textit{GSMM} and \textit{GSCodeBERT} can greatly improve the state-of-the-art models \textit{MMAN} and \textit{GraphCodeBERT} by $92%$ and $22%$ on R@1, and $63%$ and $11.5%$ on MRR, respectively.
Wed 16 NovDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
11:00 - 12:30 | Mining Software RepositoriesResearch Papers / Demonstrations at SRC Auditorium 2 Chair(s): Timofey Bryksin JetBrains Research | ||
11:00 15mTalk | An Exploratory Study on the Predominant Programming Paradigms in Python Code Research Papers DOI Pre-print Media Attached | ||
11:15 15mTalk | An Empirical Study of Blockchain System Vulnerabilities: Modules, Types, and Patterns Research Papers Xiao Yi Chinese University of Hong Kong, Daoyuan Wu Chinese University of Hong Kong, Lingxiao Jiang Singapore Management University, Yuzhou Fang Chinese University of Hong Kong, Kehuan Zhang Chinese University of Hong Kong, Wei Zhang Nanjing University of Posts and Telecommunications DOI | ||
11:30 15mTalk | How to Better Utilize Code Graphs in Semantic Code Search? Research Papers Yucen Shi Northeastern University, Ying Yin Northeastern University, Zhengkui Wang Singapore Institute of Technology, David Lo Singapore Management University, Tao Zhang Macau University of Science and Technology, Xin Xia Huawei, Yuhai Zhao Northeastern University, Bowen Xu Singapore Management University DOI | ||
11:45 15mTalk | 23 Shades of Self-Admitted Technical Debt: An Empirical Study on Machine Learning Software Research Papers David OBrien Iowa State University, Sumon Biswas Carnegie Mellon University, Sayem Mohammad Imtiaz Iowa State University, Rabe Abdalkareem Carleton University, Emad Shihab Concordia University, Hridesh Rajan Iowa State University DOI | ||
12:00 7mTalk | WikiDoMiner: Wikipedia Domain-specific Miner Demonstrations Saad Ezzini University of Luxembourg, Sallam Abualhaija University of Luxembourg, Mehrdad Sabetzadeh University of Ottawa | ||
12:08 7mTalk | RegMiner: Mining Replicable Regression Dataset from Code Repositories Demonstrations Xuezhi Song Fudan University, Yun Lin Shanghai Jiao Tong University; National University of Singapore, Yijian Wu Fudan University, Yifan Zhang National University of Singapore, Siang Hwee Ng National University of Singapore, Xin Peng Fudan University, Jin Song Dong National University of Singapore, Hong Mei Peking University |