Summary: | With processors tending toward more processing cores instead of higher clock speeds, developers are increasingly forced to grapple with concurrent paradigms to maximally exploit new CPU designs. Embracing concurrent paradigms entails the poten- tial risk of encountering concurrent software faults. Concurrent faults result from unforeseen timings of interactions between concurrent components, as opposed to traditional software faults that arise from functional failures. As a result, concurrent faults have a higher probability of surviving the software development process, potentially causing a catastrophic failure of high cost. As the complexity of software and hardware systems increases they become increasingly difficult to test. One measure of com- plexity is the number of potential execution paths a system can follow, with a higher complexity attributed to a greater number of paths. In concurrent software, the number of execution paths in a system typically increases exponentially as the number of concurrent components increase. Testing complex concurrent soft- ware is therefore difficult, with state of the art static and dynamic analysis techniques yielding only false positives or exhausted resources. This problem is likely only to be exacerbated given the trends highlighted above. Stochastic metaheuristic search techniques can often triumph where deterministic or analytical techniques fail. Methods such as Genetic Algorithms and Ant Colony Optimisation have shown great strength on hard problems, including testing concurrent software. Metaheuristic techniques often trade a perfect solution for good enough solutions, and merely accurately detecting a concurrent fault is better than allowing a fault to survive to a production system. Whilst metaheuristic techniques have had some success in this domain, the state of the art still struggles for a high success rate in some circumstances. There are a few metaheuristic search techniques that have yet to be tried in this area, and this thesis presents a study on one such technique. This thesis presents a literature review, detailing the state of the art in detecting concurrent faults in software and hardware systems. Following a review of metaheuristic techniques applied to finding concurrent faults, I set out a hypothesis asserting that a particular subclass of metaheuristic techniques, Estimation of Distribution Algorithms, are effective in detecting and debugging concurrent faults. To investigate the hypothesis, I first make an algorithmic proposal based on a particular EDA to search the state space of concurrent systems. I then demonstrate through experimentation the ability of the algorithm to detect faults and to optimise the quality of faults found in systems derived from industrial scenarios. I also outline methods of using features unique to EDAs to scale to large systems. Finally, I complete the thesis with a conclusion examining the hypothesis with respect to the evidence collected from empirical work, highlighting the novel aspects of the thesis and outlining future paths of research.
|