Clustering Classes in Packages for Program Comprehension

During software maintenance and evolution, one of the important tasks faced by developers is to understand a system quickly and accurately. With the increasing size and complexity of an evolving system, program comprehension becomes an increasingly difficult activity. Given a target system for compr...

Full description

Bibliographic Details
Main Authors: Xiaobing Sun, Xiangyue Liu, Bin Li, Bixin Li, David Lo, Lingzhi Liao
Format: Article
Language:English
Published: Hindawi Limited 2017-01-01
Series:Scientific Programming
Online Access:http://dx.doi.org/10.1155/2017/3787053
id doaj-e6f041937f874a71bd1633b1a0c7ae2b
record_format Article
spelling doaj-e6f041937f874a71bd1633b1a0c7ae2b2021-07-02T05:42:26ZengHindawi LimitedScientific Programming1058-92441875-919X2017-01-01201710.1155/2017/37870533787053Clustering Classes in Packages for Program ComprehensionXiaobing Sun0Xiangyue Liu1Bin Li2Bixin Li3David Lo4Lingzhi Liao5School of Information Engineering, Yangzhou University, Yangzhou, ChinaSchool of Information Engineering, Yangzhou University, Yangzhou, ChinaSchool of Information Engineering, Yangzhou University, Yangzhou, ChinaSchool of Computer Science and Engineering, Southeast University, Nanjing, ChinaSchool of Information Systems, Singapore Management University, SingaporeNanjing University of Information Science & Technology, Nanjing, ChinaDuring software maintenance and evolution, one of the important tasks faced by developers is to understand a system quickly and accurately. With the increasing size and complexity of an evolving system, program comprehension becomes an increasingly difficult activity. Given a target system for comprehension, developers may first focus on the package comprehension. The packages in the system are of different sizes. For small-sized packages in the system, developers can easily comprehend them. However, for large-sized packages, they are difficult to understand. In this article, we focus on understanding these large-sized packages and propose a novel program comprehension approach for large-sized packages, which utilizes the Latent Dirichlet Allocation (LDA) model to cluster large-sized packages. Thus, these large-sized packages are separated as small-sized clusters, which are easier for developers to comprehend. Empirical studies on four real-world software projects demonstrate the effectiveness of our approach. The results show that the effectiveness of our approach is better than Latent Semantic Indexing- (LSI-) and Probabilistic Latent Semantic Analysis- (PLSA-) based clustering approaches. In addition, we find that the topic that labels each cluster is useful for program comprehension.http://dx.doi.org/10.1155/2017/3787053
collection DOAJ
language English
format Article
sources DOAJ
author Xiaobing Sun
Xiangyue Liu
Bin Li
Bixin Li
David Lo
Lingzhi Liao
spellingShingle Xiaobing Sun
Xiangyue Liu
Bin Li
Bixin Li
David Lo
Lingzhi Liao
Clustering Classes in Packages for Program Comprehension
Scientific Programming
author_facet Xiaobing Sun
Xiangyue Liu
Bin Li
Bixin Li
David Lo
Lingzhi Liao
author_sort Xiaobing Sun
title Clustering Classes in Packages for Program Comprehension
title_short Clustering Classes in Packages for Program Comprehension
title_full Clustering Classes in Packages for Program Comprehension
title_fullStr Clustering Classes in Packages for Program Comprehension
title_full_unstemmed Clustering Classes in Packages for Program Comprehension
title_sort clustering classes in packages for program comprehension
publisher Hindawi Limited
series Scientific Programming
issn 1058-9244
1875-919X
publishDate 2017-01-01
description During software maintenance and evolution, one of the important tasks faced by developers is to understand a system quickly and accurately. With the increasing size and complexity of an evolving system, program comprehension becomes an increasingly difficult activity. Given a target system for comprehension, developers may first focus on the package comprehension. The packages in the system are of different sizes. For small-sized packages in the system, developers can easily comprehend them. However, for large-sized packages, they are difficult to understand. In this article, we focus on understanding these large-sized packages and propose a novel program comprehension approach for large-sized packages, which utilizes the Latent Dirichlet Allocation (LDA) model to cluster large-sized packages. Thus, these large-sized packages are separated as small-sized clusters, which are easier for developers to comprehend. Empirical studies on four real-world software projects demonstrate the effectiveness of our approach. The results show that the effectiveness of our approach is better than Latent Semantic Indexing- (LSI-) and Probabilistic Latent Semantic Analysis- (PLSA-) based clustering approaches. In addition, we find that the topic that labels each cluster is useful for program comprehension.
url http://dx.doi.org/10.1155/2017/3787053
work_keys_str_mv AT xiaobingsun clusteringclassesinpackagesforprogramcomprehension
AT xiangyueliu clusteringclassesinpackagesforprogramcomprehension
AT binli clusteringclassesinpackagesforprogramcomprehension
AT bixinli clusteringclassesinpackagesforprogramcomprehension
AT davidlo clusteringclassesinpackagesforprogramcomprehension
AT lingzhiliao clusteringclassesinpackagesforprogramcomprehension
_version_ 1721338335434113024