N-Grams as a Measure of Naturalness and Complexity

We live in a time where software is used everywhere. It is used even for creating other software by helping developers with writing or generating new code. To do this properly, metrics to measure software quality are being used to evaluate the final code. However, they are sometimes too costly to co...

Full description

Bibliographic Details
Main Author:	Randák, Richard
Format:	Others
Language:	English
Published:	Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM) 2019
Subjects:	language model language processing ngram naturalness java code complexity software quality static analysis code metrics Software Engineering Programvaruteknik Computer Sciences Datavetenskap (datalogi)
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-90006

id	ndltd-UPSALLA1-oai-DiVA.org-lnu-90006
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-lnu-900062019-11-12T22:36:20ZN-Grams as a Measure of Naturalness and ComplexityengRandák, RichardLinnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM)2019language modellanguage processingngramnaturalnessjavacode complexitysoftware qualitystatic analysiscode metricsSoftware EngineeringProgramvaruteknikComputer SciencesDatavetenskap (datalogi)We live in a time where software is used everywhere. It is used even for creating other software by helping developers with writing or generating new code. To do this properly, metrics to measure software quality are being used to evaluate the final code. However, they are sometimes too costly to compute, or simply don't have the expected effect. Therefore, new and better ways of software evaluation are needed. In this research, we are investigating the usage of the statistical approaches used commonly in the natural language processing (NLP) area. In order to introduce and evaluate new metrics, a Java N-gram language model is created from a large Java language code corpus. Naturalness, a method-level metric, is introduced and calculated for chosen projects. The correlation with well-known software complexity metrics are calculated and discussed. The results, however, show that the metric, in the form that we have defined it, is not suitable for software complexity evaluation since it is highly correlated with a well-known metric (token count), which is much easier to compute. Different definition of the metric is suggested, which could be a target of future study and research. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-90006application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	language model language processing ngram naturalness java code complexity software quality static analysis code metrics Software Engineering Programvaruteknik Computer Sciences Datavetenskap (datalogi)
spellingShingle	language model language processing ngram naturalness java code complexity software quality static analysis code metrics Software Engineering Programvaruteknik Computer Sciences Datavetenskap (datalogi) Randák, Richard N-Grams as a Measure of Naturalness and Complexity
description	We live in a time where software is used everywhere. It is used even for creating other software by helping developers with writing or generating new code. To do this properly, metrics to measure software quality are being used to evaluate the final code. However, they are sometimes too costly to compute, or simply don't have the expected effect. Therefore, new and better ways of software evaluation are needed. In this research, we are investigating the usage of the statistical approaches used commonly in the natural language processing (NLP) area. In order to introduce and evaluate new metrics, a Java N-gram language model is created from a large Java language code corpus. Naturalness, a method-level metric, is introduced and calculated for chosen projects. The correlation with well-known software complexity metrics are calculated and discussed. The results, however, show that the metric, in the form that we have defined it, is not suitable for software complexity evaluation since it is highly correlated with a well-known metric (token count), which is much easier to compute. Different definition of the metric is suggested, which could be a target of future study and research.
author	Randák, Richard
author_facet	Randák, Richard
author_sort	Randák, Richard
title	N-Grams as a Measure of Naturalness and Complexity
title_short	N-Grams as a Measure of Naturalness and Complexity
title_full	N-Grams as a Measure of Naturalness and Complexity
title_fullStr	N-Grams as a Measure of Naturalness and Complexity
title_full_unstemmed	N-Grams as a Measure of Naturalness and Complexity
title_sort	n-grams as a measure of naturalness and complexity
publisher	Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM)
publishDate	2019
url	http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-90006
work_keys_str_mv	AT randakrichard ngramsasameasureofnaturalnessandcomplexity
_version_	1719290198695608320

N-Grams as a Measure of Naturalness and Complexity

Similar Items