A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments

Code fragment natural language description generation, also known as code summarization, refers to obtaining a natural language sequence describing a given code fragment's functionality. It is broadly agreed that applying code summarization into production can significantly improve the efficien...

Full description

Bibliographic Details
Main Authors: Xuejian Gao, Xue Jiang, Qiong Wu, Xiao Wang, Lei Lyu, Chen Lyu
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9343307/
Description
Summary:Code fragment natural language description generation, also known as code summarization, refers to obtaining a natural language sequence describing a given code fragment's functionality. It is broadly agreed that applying code summarization into production can significantly improve the efficiency of software development and maintenance. In recent years, syntactic analysis (SA) technology and Latent Dirichlet Allocation (LDA) has been widely used in code summarization and has achieved good results. However, most of the existing techniques focus on core code statements, and thus their generated code summarization lacks a logical description of the code fragment's holistic information. To this end, we propose a code summarization method based on multiple modules to generate natural language for each code statement by constructing a new type of natural language template. Meanwhile, to utilize the code fragment's holistic information, we adopt the code statement partition rules and cosine similarity measure to rank and optimize the weight of the overall information of the code fragment, and finally generate the holistic natural language description of the code fragment. The experimental results demonstrate that our method can generate more concise and logical natural language descriptions than existing models.
ISSN:2169-3536