A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments

Code fragment natural language description generation, also known as code summarization, refers to obtaining a natural language sequence describing a given code fragment's functionality. It is broadly agreed that applying code summarization into production can significantly improve the efficien...

Full description

Bibliographic Details
Main Authors: Xuejian Gao, Xue Jiang, Qiong Wu, Xiao Wang, Lei Lyu, Chen Lyu
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9343307/
id doaj-7578168be7d04104b6d772a9258a0281
record_format Article
spelling doaj-7578168be7d04104b6d772a9258a02812021-03-30T15:15:23ZengIEEEIEEE Access2169-35362021-01-019215792159210.1109/ACCESS.2021.30559559343307A Multi-Module Based Method for Generating Natural Language Descriptions of Code FragmentsXuejian Gao0https://orcid.org/0000-0001-6842-3411Xue Jiang1https://orcid.org/0000-0002-5317-865XQiong Wu2https://orcid.org/0000-0001-8142-4419Xiao Wang3https://orcid.org/0000-0002-8328-9852Lei Lyu4https://orcid.org/0000-0001-9521-6039Chen Lyu5https://orcid.org/0000-0002-5044-1459School of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaCode fragment natural language description generation, also known as code summarization, refers to obtaining a natural language sequence describing a given code fragment's functionality. It is broadly agreed that applying code summarization into production can significantly improve the efficiency of software development and maintenance. In recent years, syntactic analysis (SA) technology and Latent Dirichlet Allocation (LDA) has been widely used in code summarization and has achieved good results. However, most of the existing techniques focus on core code statements, and thus their generated code summarization lacks a logical description of the code fragment's holistic information. To this end, we propose a code summarization method based on multiple modules to generate natural language for each code statement by constructing a new type of natural language template. Meanwhile, to utilize the code fragment's holistic information, we adopt the code statement partition rules and cosine similarity measure to rank and optimize the weight of the overall information of the code fragment, and finally generate the holistic natural language description of the code fragment. The experimental results demonstrate that our method can generate more concise and logical natural language descriptions than existing models.https://ieeexplore.ieee.org/document/9343307/Source code summarizationprogram comprehensionprogram description
collection DOAJ
language English
format Article
sources DOAJ
author Xuejian Gao
Xue Jiang
Qiong Wu
Xiao Wang
Lei Lyu
Chen Lyu
spellingShingle Xuejian Gao
Xue Jiang
Qiong Wu
Xiao Wang
Lei Lyu
Chen Lyu
A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments
IEEE Access
Source code summarization
program comprehension
program description
author_facet Xuejian Gao
Xue Jiang
Qiong Wu
Xiao Wang
Lei Lyu
Chen Lyu
author_sort Xuejian Gao
title A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments
title_short A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments
title_full A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments
title_fullStr A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments
title_full_unstemmed A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments
title_sort multi-module based method for generating natural language descriptions of code fragments
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Code fragment natural language description generation, also known as code summarization, refers to obtaining a natural language sequence describing a given code fragment's functionality. It is broadly agreed that applying code summarization into production can significantly improve the efficiency of software development and maintenance. In recent years, syntactic analysis (SA) technology and Latent Dirichlet Allocation (LDA) has been widely used in code summarization and has achieved good results. However, most of the existing techniques focus on core code statements, and thus their generated code summarization lacks a logical description of the code fragment's holistic information. To this end, we propose a code summarization method based on multiple modules to generate natural language for each code statement by constructing a new type of natural language template. Meanwhile, to utilize the code fragment's holistic information, we adopt the code statement partition rules and cosine similarity measure to rank and optimize the weight of the overall information of the code fragment, and finally generate the holistic natural language description of the code fragment. The experimental results demonstrate that our method can generate more concise and logical natural language descriptions than existing models.
topic Source code summarization
program comprehension
program description
url https://ieeexplore.ieee.org/document/9343307/
work_keys_str_mv AT xuejiangao amultimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments
AT xuejiang amultimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments
AT qiongwu amultimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments
AT xiaowang amultimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments
AT leilyu amultimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments
AT chenlyu amultimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments
AT xuejiangao multimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments
AT xuejiang multimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments
AT qiongwu multimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments
AT xiaowang multimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments
AT leilyu multimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments
AT chenlyu multimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments
_version_ 1724179763849854976