A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments
Code fragment natural language description generation, also known as code summarization, refers to obtaining a natural language sequence describing a given code fragment's functionality. It is broadly agreed that applying code summarization into production can significantly improve the efficien...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9343307/ |
id |
doaj-7578168be7d04104b6d772a9258a0281 |
---|---|
record_format |
Article |
spelling |
doaj-7578168be7d04104b6d772a9258a02812021-03-30T15:15:23ZengIEEEIEEE Access2169-35362021-01-019215792159210.1109/ACCESS.2021.30559559343307A Multi-Module Based Method for Generating Natural Language Descriptions of Code FragmentsXuejian Gao0https://orcid.org/0000-0001-6842-3411Xue Jiang1https://orcid.org/0000-0002-5317-865XQiong Wu2https://orcid.org/0000-0001-8142-4419Xiao Wang3https://orcid.org/0000-0002-8328-9852Lei Lyu4https://orcid.org/0000-0001-9521-6039Chen Lyu5https://orcid.org/0000-0002-5044-1459School of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaCode fragment natural language description generation, also known as code summarization, refers to obtaining a natural language sequence describing a given code fragment's functionality. It is broadly agreed that applying code summarization into production can significantly improve the efficiency of software development and maintenance. In recent years, syntactic analysis (SA) technology and Latent Dirichlet Allocation (LDA) has been widely used in code summarization and has achieved good results. However, most of the existing techniques focus on core code statements, and thus their generated code summarization lacks a logical description of the code fragment's holistic information. To this end, we propose a code summarization method based on multiple modules to generate natural language for each code statement by constructing a new type of natural language template. Meanwhile, to utilize the code fragment's holistic information, we adopt the code statement partition rules and cosine similarity measure to rank and optimize the weight of the overall information of the code fragment, and finally generate the holistic natural language description of the code fragment. The experimental results demonstrate that our method can generate more concise and logical natural language descriptions than existing models.https://ieeexplore.ieee.org/document/9343307/Source code summarizationprogram comprehensionprogram description |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xuejian Gao Xue Jiang Qiong Wu Xiao Wang Lei Lyu Chen Lyu |
spellingShingle |
Xuejian Gao Xue Jiang Qiong Wu Xiao Wang Lei Lyu Chen Lyu A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments IEEE Access Source code summarization program comprehension program description |
author_facet |
Xuejian Gao Xue Jiang Qiong Wu Xiao Wang Lei Lyu Chen Lyu |
author_sort |
Xuejian Gao |
title |
A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments |
title_short |
A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments |
title_full |
A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments |
title_fullStr |
A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments |
title_full_unstemmed |
A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments |
title_sort |
multi-module based method for generating natural language descriptions of code fragments |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
Code fragment natural language description generation, also known as code summarization, refers to obtaining a natural language sequence describing a given code fragment's functionality. It is broadly agreed that applying code summarization into production can significantly improve the efficiency of software development and maintenance. In recent years, syntactic analysis (SA) technology and Latent Dirichlet Allocation (LDA) has been widely used in code summarization and has achieved good results. However, most of the existing techniques focus on core code statements, and thus their generated code summarization lacks a logical description of the code fragment's holistic information. To this end, we propose a code summarization method based on multiple modules to generate natural language for each code statement by constructing a new type of natural language template. Meanwhile, to utilize the code fragment's holistic information, we adopt the code statement partition rules and cosine similarity measure to rank and optimize the weight of the overall information of the code fragment, and finally generate the holistic natural language description of the code fragment. The experimental results demonstrate that our method can generate more concise and logical natural language descriptions than existing models. |
topic |
Source code summarization program comprehension program description |
url |
https://ieeexplore.ieee.org/document/9343307/ |
work_keys_str_mv |
AT xuejiangao amultimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments AT xuejiang amultimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments AT qiongwu amultimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments AT xiaowang amultimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments AT leilyu amultimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments AT chenlyu amultimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments AT xuejiangao multimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments AT xuejiang multimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments AT qiongwu multimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments AT xiaowang multimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments AT leilyu multimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments AT chenlyu multimodulebasedmethodforgeneratingnaturallanguagedescriptionsofcodefragments |
_version_ |
1724179763849854976 |