Using Topic Models to Support Software Maintenance

Latent topic models are statistical structures in which a "latent topic" describes some relationship between parts of the data. Co-maintenance is defined as an observable property of software systems under source control in which source code fragments are modified together in some time fr...

Full description

Bibliographic Details
Main Author: Grant, Scott
Other Authors: Queen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))
Language:en
en
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/1974/7169
id ndltd-LACETR-oai-collectionscanada.gc.ca-OKQ.1974-7169
record_format oai_dc
spelling ndltd-LACETR-oai-collectionscanada.gc.ca-OKQ.1974-71692013-12-20T03:40:30ZUsing Topic Models to Support Software MaintenanceGrant, Scottlatent topic modelssoftware engineeringLatent topic models are statistical structures in which a "latent topic" describes some relationship between parts of the data. Co-maintenance is defined as an observable property of software systems under source control in which source code fragments are modified together in some time frame. When topic models are applied to software systems, latent topics emerge from code fragments. However, it is not yet known what these latent topics mean. In this research, we analyse software maintenance history, and show that latent topics often correspond to code fragments that are maintained together. Moreover, we show that latent topic models can identify such co-maintenance relationships even with no supervision. We can use this correlation both to categorize and understand maintenance history, and to predict future co-maintenance in practice. The relationship between co-maintenance and topics is directly analysed within changelists, with respect to both local pairwise code fragment similarity and global system-wide fragment similarity. This analysis is used to evaluate topic models used with a domain-specific programming language for web service similarity detection, and to estimate appropriate topic counts for modelling source code.Thesis (Ph.D, Computing) -- Queen's University, 2012-04-30 18:16:04.05Queen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))2012-04-30 09:06:30.92012-04-30 18:16:04.052012-04-30T22:53:06Z2012-04-30T22:53:06Z2012-04-30Thesishttp://hdl.handle.net/1974/7169enenCanadian thesesThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
collection NDLTD
language en
en
sources NDLTD
topic latent topic models
software engineering
spellingShingle latent topic models
software engineering
Grant, Scott
Using Topic Models to Support Software Maintenance
description Latent topic models are statistical structures in which a "latent topic" describes some relationship between parts of the data. Co-maintenance is defined as an observable property of software systems under source control in which source code fragments are modified together in some time frame. When topic models are applied to software systems, latent topics emerge from code fragments. However, it is not yet known what these latent topics mean. In this research, we analyse software maintenance history, and show that latent topics often correspond to code fragments that are maintained together. Moreover, we show that latent topic models can identify such co-maintenance relationships even with no supervision. We can use this correlation both to categorize and understand maintenance history, and to predict future co-maintenance in practice. The relationship between co-maintenance and topics is directly analysed within changelists, with respect to both local pairwise code fragment similarity and global system-wide fragment similarity. This analysis is used to evaluate topic models used with a domain-specific programming language for web service similarity detection, and to estimate appropriate topic counts for modelling source code. === Thesis (Ph.D, Computing) -- Queen's University, 2012-04-30 18:16:04.05
author2 Queen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))
author_facet Queen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))
Grant, Scott
author Grant, Scott
author_sort Grant, Scott
title Using Topic Models to Support Software Maintenance
title_short Using Topic Models to Support Software Maintenance
title_full Using Topic Models to Support Software Maintenance
title_fullStr Using Topic Models to Support Software Maintenance
title_full_unstemmed Using Topic Models to Support Software Maintenance
title_sort using topic models to support software maintenance
publishDate 2012
url http://hdl.handle.net/1974/7169
work_keys_str_mv AT grantscott usingtopicmodelstosupportsoftwaremaintenance
_version_ 1716621412091822080