Software reliability prediction

This thesis presents some extensions to existing methods of software reliability estimation and prediction. Firstly, we examine a technique called 'recalibration' by means of which many existing software reliability prediction algorithms assess past predictive performance in order to impro...

Full description

Bibliographic Details
Main Author: Wright, David R.
Published: City University London 2001
Subjects:
005
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.390943
Description
Summary:This thesis presents some extensions to existing methods of software reliability estimation and prediction. Firstly, we examine a technique called 'recalibration' by means of which many existing software reliability prediction algorithms assess past predictive performance in order to improve the accuracy of current reliability predictions. This existing technique for forecasting future failure times of software is already quite general. Indeed, whenever your predictions are produced in the form of time-to-failure distributions, successively as more actual failure times are observed, you can apply recalibration irrespective both of which probabilistic software reliability model and of which statistical inference technique you are using. In the current work we further generalise the recalibration method to those situations where empirical failure data take the form of failure-counts rather than precise inter-failure times. We then briefly explore how the reasoning we have used, in this extension of recalibration to the prediction of failure-count sequences, might further extend to recalibration of other representations of predicted reliability. Secondly, the thesis contains a theoretical discussion of some modelling possibilities for improving software reliability predictions by the incorporation of disparate sources of data. There are well established techniques for forecasting the reliability of a particular software product using as data only the past failure behaviour of that software under statistically representative operational testing. However, there may sometimes be reasons for seeking improved predictive accuracy by using data of other kinds too, rather than relying on this single source of empirical evidence. Notable among these is the economic impracticability, in many cases, of obtaining sufficient, representative software failure vs. time data (from execution of the particular product in question) to determine, by inference applied to software reliability growth models, whether or not a high reliability requirement has been achieved in a particular case, prior to extensive operational use of the software in question. For example, this problem arises in particular for safety-critical systems, whose required reliability is often extremely high. An accurate reliability assessment is often required in advance of a decision whether to release the software for actual use in the field. Another argument for attempting to determine other usable data sources for software reliability prediction is the value that would attach to rigorous empirical confirmation or refutation of any of the many existing theories and claims about what are the factors of software reliability, and how these factors may interact, in some given context. In those cases, such as some safety-critical systems, in which assessment of a high reliability level is required at an early stage, the necessary assessment is in practice often currently carried out rather informally, and often does claim to take account of many different types of evidence experience of previous, similar systems; evidence of the efficacy of the development process; expert judgement, etc-to supplement the limited available data on past failure vs. time behaviour which emanates from testing of the software within a realistic usage environment. Ideally, we would like this assessment to allow all such evidence to be combined into a final numerical measure of reliability in a scientifically more rigorous way. To address these problems, we first examine some candidate general statistical regression models used in other fields such as medicine and insurance and discuss how these might be applied to prediction of software reliability. We have here termed these models explanatory variables regression models. The goal here would be to investigate statistically how to explain differences in software failure behaviour in terms of differences in other measured characteristics of a number of different statistical 'individuals', or 'experimental units': We discuss the interpretation, within the software reliability context, of this statistical concept of an 'individual', with our favoured interpretation being such that a single statistical reliability regression model would be used to model simultaneously a family of parallel series of inter-failure times emanating from measurably different software products or from measurably different installations of a single software product. In statistical regression terms here, each one of these distinct failure vs. time histories would be the 'response variable' corresponding to one of these 'individuals'. The other measurable differences between these individuals would be captured in the model as explanatory variable values which would differ from one individual to another. Following this discussion, we then leave general regression models to examine a slightly different theoretical approach-to essentially the same question of how to incorporate diverse data within our predictions-through an examination of models for 'unexplained' differences between individuals' failure behaviours. Here, rather than assuming the availability of putative 'explanatory variables' to distinguish our statistical individuals and 'explain' the way that their reliabilities differ, we instead use randomness alone to model their differences in reliability. We have termed the class of models produced by this approach similar products models, meaning models in which we regard the individuals' different likely failure vs. time behaviours as initially (i. e. a priori) indistinguishable to us: Here, either we cannot (or we choose not to attempt with a formal model to) explain the differences between individuals' reliabilities in terms of other metrics applied to our individuals, but we do still expect that the 'similar products" (i. e. the individuals') reliabilities will be different from each other: We postulate the existence of a single probability distribution from which we may assume our individuals' true, unknown reliabilities to have all been drawn independently in a random fashion. We present some mathematical consequences, showing how, within such a modelling framework, prior belief about the distribution of reliabilities assumes great importance for model consequences. We also present some illustrative numerical results that seem to suggest that experience from previous products or environments, so represented within the model-even where very high operational dependability has been achieved in such previous cases-can only modestly improve our confidence in the reliability of a new product, or of an existing product when transferred to a new environment.