CodeMatcher: A Tool for Large-Scale Code Search Based on Query Semantics Matching (ESEC/FSE 2022 - Demonstrations)

Write a Blog >>

Mon 14 - Fri 18 November 2022 Singapore

Who

Chao Liu, Xuanlin Bao, Xin Xia, Meng Yan, David Lo, Ting Zhang

Track

ESEC/FSE 2022 Demonstrations

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 14 Nov 2022 15:08 - 15:15 at SRC LT 51 - Community Chair(s): Dirk Riehle

Abstract

Due to the emergence of large-scale codebases, such as GitHub and Gitee, searching and reusing existing code can help developers substantially improve software development productivity. Over the years, many code search tools have been developed. Early tools leveraged the information retrieval (IR) technique to perform an efficient code search for a frequently changed large-scale codebase. However, the search accuracy was low due to the semantic mismatch between query and code. In the recent years, many tools leveraged Deep Learning (DL) technique to address this issue. But the DL-based tools are slow and the search accuracy is unstable.

In this paper, we presented an IR-based tool CodeMatcher, which inherits the advantages of the DL-based tool in query semantics matching. Generally, CodeMatcher builds indexing for a large-scale codebase at first to accelerate the search response time. For a given search query, it addresses irrelevant and noisy words in the query, then retrieves candidate code from the indexed codebase via iterative fuzzy search, and finally reranks the candidates based on two designed measures of semantic matching between query and candidates. We implemented CodeMatcher as a search engine website. To verify the effectiveness of our tool, we evaluated CodeMatcher on 41k+ open-source Java repositories. Experimental results showed that CodeMatcher can achieve an industrial-level response time (0.3s) with a common server with an Intel-i7 CPU. On the search accuracy, CodeMatcher significantly outperforms three state-of-the-art tools (DeepCS, UNIF, and CodeHow) and two online search engines (GitHub search and Google search).

Chao Liu

Chongqing University

Xuanlin Bao

Chongqing University

Xin Xia

Huawei

China

Meng Yan

Chongqing University

China

David Lo

Singapore Management University

Singapore

Ting Zhang

Singapore Management University

Singapore

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 14 Nov
Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

14:00 - 15:30	CommunityResearch Papers / Ideas, Visions and Reflections / Demonstrations / Industry Paper at SRC LT 51 Chair(s): Dirk Riehle University of Bavaria, Erlangen

14:00 15m Talk		In War and Peace: The Impact of World Politics on Software Ecosystems Ideas, Visions and Reflections Raula Gaikovina Kula Nara Institute of Science and Technology, Christoph Treude University of Melbourne DOI
14:15 15m Talk		A Retrospective Study of One Decade of Artifact Evaluations Research Papers Stefan Winter LMU Munich, Christopher Steven Timperley Carnegie Mellon University, Ben Hermann TU Dortmund, Jürgen Cito TU Wien, Jonathan Bell Northeastern University, Michael Hilton Carnegie Mellon University, Dirk Beyer LMU Munich DOI
14:30 15m Talk		Understanding Skills for OSS Communities on GitHub Research Papers Jenny T. Liang University of Washington, Thomas Zimmermann Microsoft Research, Denae Ford Microsoft Research DOI Pre-print Media Attached
14:45 15m Talk		Achievement Unlocked: A Case Study on Gamifying DevOps Practices in Industry Industry Paper Patrick Ayoup Concordia University, Diego Elias Costa Concordia University, Canada, Emad Shihab Concordia University DOI
15:00 7m Talk		iTiger: An Automatic Issue Title Generation Tool Demonstrations Ting Zhang Singapore Management University, Ivana Clairine Irsan Singapore Management University, Ferdian Thung Singapore Management University, DongGyun Han Royal Holloway, University of London, David Lo Singapore Management University, Lingxiao Jiang Singapore Management University
15:08 7m Talk		CodeMatcher: A Tool for Large-Scale Code Search Based on Query Semantics Matching Demonstrations Chao Liu Chongqing University, Xuanlin Bao Chongqing University, Xin Xia Huawei, Meng Yan Chongqing University, David Lo Singapore Management University, Ting Zhang Singapore Management University
15:15 15m Talk		Generating Realistic Vulnerabilities via Neural Code Editing: An Empirical Study Research Papers Yu Nong Washington State University, Yuzhe Ou University of Texas at Dallas, Michael Pradel University of Stuttgart, Feng Chen University of Texas at Dallas, Haipeng Cai Washington State University DOI Pre-print