Improving software remodularisation

Maintenance is estimated to be the most expensive stage of the software development lifecycle. While documentation is widely considered essential to reduce the cost of maintaining software, it is commonly neglected. Auto- mated reverse engineering tools present a potential solution to this problem b...

Full description

Bibliographic Details
Main Author: Hall, Mathew J.
Other Authors: McMinn, Philip
Published: University of Sheffield 2013
Subjects:
005
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.577421
Description
Summary:Maintenance is estimated to be the most expensive stage of the software development lifecycle. While documentation is widely considered essential to reduce the cost of maintaining software, it is commonly neglected. Auto- mated reverse engineering tools present a potential solution to this problem by allowing documentation, in the form of models, to be produced cheaply. State machines, module dependency graphs (MDGs), and other software models may be extracted automatically from software using reverse engineering tools. However the models are typically large and complex due to a lack of abstraction. Solutions to this problem use transformations (state machines) or “remodularisation” (MDGs) to enrich the diagram with a hierarchy to uncover the system’s structure. This task is complicated by the subjectivity of the problem. Automated techniques aim to optimise the structure, either through design quality metrics or by grouping elements by the limited number of available features. Both of these approaches can lead to a mismatch between the algorithm’s output and the developer’s intentions. This thesis addresses the problem from two perspectives: firstly, the improvement of automated hierarchy generation to the extent possible, and then augmentation using additional expert knowledge in a refinement process. Investigation begins on the application of remodularisation to the state machine hierarchy generation problem, which is shown to be feasible, due to the common underlying graph structure present in both MDGs and state machines. Following this success, genetic programming is investigated as a means to improve upon this result, which is found to produce hierarchies that better optimise a quality metric at higher levels. The disparity between metric-maximising performance and human-acceptable performance is then examined, resulting in the SUMO algorithm, which in- corporates domain knowledge to interactively refine a modularisation. The thesis concludes with an empirical user study conducted with 35 participants, showing, while its performance is highly dependent on the individual user, SUMO allows a modularisation of a 122 file component to be refined in a short period of time (within an hour for most participants).