Qualitative repository analysis with RepoGrams

The availability of open source software projects has created an enormous opportunity for empirical evaluations in software engineering research. However, this availability requires that researchers judiciously select an appropriate set of evaluation targets and properly document this rationale. Thi...

Full description

Bibliographic Details
Main Author: Rozenberg, Daniel
Language:English
Published: University of British Columbia 2015
Online Access:http://hdl.handle.net/2429/54495
Description
Summary:The availability of open source software projects has created an enormous opportunity for empirical evaluations in software engineering research. However, this availability requires that researchers judiciously select an appropriate set of evaluation targets and properly document this rationale. This selection process is often critical as it can be used to argue for the generalizability of the evaluated tool or method. To understand the selection criteria that researchers use in their work we systematically read 55 research papers appearing in six major software engineering conferences. Using a grounded theory approach we iteratively developed a codebook and coded these papers along five different dimensions, all of which relate to how the authors select evaluation targets in their work. Our results indicate that most authors relied on qualitative and subjective features to select their evaluation targets. Building on these results we developed a tool called RepoGrams, which supports researchers in comparing and contrasting source code repositories of multiple software projects and helps them in selecting appropriate evaluation targets for their studies. We describe RepoGrams's design and implementation, and evaluate it in two user studies with 74 undergraduate students and 14 software engineering researchers who used RepoGrams to understand, compare, and contrast various metrics on source code repositories. For example, a researcher interested in evaluating a tool might want to show that it is useful for both software projects that are written using a single programming language, as well as ones that are written using dozens of programming languages. RepoGrams allows the researcher to find a set of software projects that are diverse with respect to this metric. We also evaluate the amount of effort required by researchers to extend RepoGrams for their own research projects in a case study with 2 researchers. We find that RepoGrams helps software engineering researchers understand and compare characteristics of a project's source repository and that RepoGrams can be used by non-expert users to investigate project histories. The tool is designed primarily for software engineering researchers who are interested in analyzing and comparing source code repositories across multiple dimensions. === Science, Faculty of === Computer Science, Department of === Graduate