Summary: | Faults are common-place and inevitable in complex applications. Hence, automated techniques are necessary to analyze failed executions and debug the application to locate the fault. For locating faults in programs, dynamic slices have been shown to be very effective in reducing the effort of debugging. The user needs to inspect only a small subset of program statements to get to the root cause of the fault. While prior work has primarily focussed on single-threaded programs, this dissertation shows how dynamic slicing can be used for fault location in multithreaded programs. This dissertation also shows that dynamic slices can be used to track down faults due to data races in multithreaded programs by incorporating additional data dependences that arise in the presence of many threads. In order to construct the dynamic slices, dependence traces are collected and processed. However, program runs generate traces in the order of Gigabytes in a few seconds. Hence, for multithreaded program runs that are long-running, the process of collecting and storing these traces poses a significant challenge. This dissertation proposes two techniques to overcome this challenge. Experiments indicate that the techniques combined can reduce the size of the traces by 3 orders of magnitude. For applications that are critical and for which down time is highly detrimental, techniques for surviving software failures and letting the execution continue are desired. This dissertation proposes one such technique to recover applications from a class of faults that are caused by the execution environment and prevent the fault in future runs. This technique has been successfully used to avoid faults in a variety of applications caused due to thread scheduling, heap overflow, and malformed user requests. Case studies indicate that, for most environment bugs, the point in the execution where the environment modification is necessary can be clearly pin-pointed by using the proposed system and the fault can be avoided in the first attempt. The case studies also show that the patches needed to prevent the different faults are simple and the overhead induced by the system during the normal run of the application is less than 10 \%, on average.
|