API + Code = Better Code Summary? - Insights from an Exploratory Study (PROMISE'22)

Write a Blog >>

Mon 14 - Fri 18 November 2022 Singapore

Who

Prantik Parashar Sarmah, Sridhar Chimalakonda

Track

PROMISE'22

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 18 Nov 2022 16:20 - 16:40 at Town Plaza GLR - Miscellaneous & Closing

Abstract

Automatic code summarization techniques aid in program comprehension as they try to generate a human-level summary in natural language from a programming language. Recent research in this area has seen significant developments from basic Seq2Seq models to different flavors of Transformer models, which try to encode the structural components of the source code using some input representation. Apart from the source code itself, other components, such as API knowledge, have previously been helpful in code summarization using recurrent neural networks (RNN) as it gives crucial information about the code’s functionality. So, in this article, along with the source code structure, we explore the importance of API knowledge in code summarization and try to understand whether it helps in improving the summaries. Our model uses a Transformer-based architecture containing two encoders for two input modules, source code and API sequences, and a joint decoder to generate summaries combining the outputs of two encoders. We experimented with our proposed model on a dataset of java projects collected from GitHub containing around 87K <Java Method, API Sequence, Comment> triplets. The experiments show our model outperforms most of the existing RNN-based approaches, but the overall performance does not improve compared with the state-of-the-art approach using Transformers. Thus, the results show that although API information is helpful for code summarization, we still need better methods to extract the valuable information from the API sequences.

Prantik Parashar Sarmah

Sridhar Chimalakonda

IIT Tirupati

India