Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based Studies

Source Code Generation (SCG) is a prevalent research field in the automation software engineering sector that maps specific descriptions to various sorts of executable code. Along with the numerous intensive studies, diverse SCG types that integrate different scenarios and contexts continue to emerg...

Full description

Bibliographic Details
Main Authors: Chen Yang, Yan Liu, Changqing Yin
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/23/9/1174
id doaj-e04d7f7e51174d85ac2e8bd45511f6f4
record_format Article
spelling doaj-e04d7f7e51174d85ac2e8bd45511f6f42021-09-26T00:06:55ZengMDPI AGEntropy1099-43002021-09-01231174117410.3390/e23091174Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based StudiesChen Yang0Yan Liu1Changqing Yin2School of Software Engineering, Tongji University, Shanghai 201804, ChinaSchool of Software Engineering, Tongji University, Shanghai 201804, ChinaSchool of Software Engineering, Tongji University, Shanghai 201804, ChinaSource Code Generation (SCG) is a prevalent research field in the automation software engineering sector that maps specific descriptions to various sorts of executable code. Along with the numerous intensive studies, diverse SCG types that integrate different scenarios and contexts continue to emerge. As the ultimate purpose of SCG, Natural Language-based Source Code Generation (NLSCG) is growing into an attractive and challenging field, as the expressibility and extremely high abstraction of the input end. The booming large-scale dataset generated by open-source code repositories and Q&A resources, the innovation of machine learning algorithms, and the development of computing capacity make the NLSCG field promising and give more opportunities to the model implementation and perfection. Besides, we observed an increasing interest stream of NLSCG relevant studies recently, presenting quite various technical schools. However, many studies are bound to specific datasets with customization issues, producing occasional successful solutions with tentative technical methods. There is no systematic study to explore and promote the further development of this field. We carried out a systematic literature survey and tool research to find potential improvement directions. First, we position the role of NLSCG among various SCG genres, and specify the generation context empirically via software development domain knowledge and programming experiences; second, we explore the selected studies collected by a thoughtfully designed snowballing process, clarify the NLSCG field and understand the NLSCG problem, which lays a foundation for our subsequent investigation. Third, we model the research problems from technical focus and adaptive challenges, and elaborate insights gained from the NLSCG research backlog. Finally, we summarize the latest technology landscape over the transformation model and depict the critical tactics used in the essential components and their correlations. This research addresses the challenges of bridging the gap between natural language processing and source code analytics, outlines different dimensions of NLSCG research concerns and technical utilities, and shows a bounded technical context of NLSCG to facilitate more future studies in this promising area.https://www.mdpi.com/1099-4300/23/9/1174natural language-based source code generationsystematic literature reviewmachine learning application
collection DOAJ
language English
format Article
sources DOAJ
author Chen Yang
Yan Liu
Changqing Yin
spellingShingle Chen Yang
Yan Liu
Changqing Yin
Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based Studies
Entropy
natural language-based source code generation
systematic literature review
machine learning application
author_facet Chen Yang
Yan Liu
Changqing Yin
author_sort Chen Yang
title Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based Studies
title_short Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based Studies
title_full Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based Studies
title_fullStr Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based Studies
title_full_unstemmed Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based Studies
title_sort recent advances in intelligent source code generation: a survey on natural language based studies
publisher MDPI AG
series Entropy
issn 1099-4300
publishDate 2021-09-01
description Source Code Generation (SCG) is a prevalent research field in the automation software engineering sector that maps specific descriptions to various sorts of executable code. Along with the numerous intensive studies, diverse SCG types that integrate different scenarios and contexts continue to emerge. As the ultimate purpose of SCG, Natural Language-based Source Code Generation (NLSCG) is growing into an attractive and challenging field, as the expressibility and extremely high abstraction of the input end. The booming large-scale dataset generated by open-source code repositories and Q&A resources, the innovation of machine learning algorithms, and the development of computing capacity make the NLSCG field promising and give more opportunities to the model implementation and perfection. Besides, we observed an increasing interest stream of NLSCG relevant studies recently, presenting quite various technical schools. However, many studies are bound to specific datasets with customization issues, producing occasional successful solutions with tentative technical methods. There is no systematic study to explore and promote the further development of this field. We carried out a systematic literature survey and tool research to find potential improvement directions. First, we position the role of NLSCG among various SCG genres, and specify the generation context empirically via software development domain knowledge and programming experiences; second, we explore the selected studies collected by a thoughtfully designed snowballing process, clarify the NLSCG field and understand the NLSCG problem, which lays a foundation for our subsequent investigation. Third, we model the research problems from technical focus and adaptive challenges, and elaborate insights gained from the NLSCG research backlog. Finally, we summarize the latest technology landscape over the transformation model and depict the critical tactics used in the essential components and their correlations. This research addresses the challenges of bridging the gap between natural language processing and source code analytics, outlines different dimensions of NLSCG research concerns and technical utilities, and shows a bounded technical context of NLSCG to facilitate more future studies in this promising area.
topic natural language-based source code generation
systematic literature review
machine learning application
url https://www.mdpi.com/1099-4300/23/9/1174
work_keys_str_mv AT chenyang recentadvancesinintelligentsourcecodegenerationasurveyonnaturallanguagebasedstudies
AT yanliu recentadvancesinintelligentsourcecodegenerationasurveyonnaturallanguagebasedstudies
AT changqingyin recentadvancesinintelligentsourcecodegenerationasurveyonnaturallanguagebasedstudies
_version_ 1717367071060262912