A Method and Tool for Finding Concurrency Bugs Involving Multiple Variables with Application to Modern Distributed Systems
Concurrency bugs are extremely hard to detect due to huge interleaving space. They are happening in the real world more often because of the prevalence of multi-threaded programs taking advantage of multi-core hardware, and microservice based distributed systems moving more and more applications to...
Main Author: | |
---|---|
Format: | Others |
Published: |
FIU Digital Commons
2018
|
Subjects: | |
Online Access: | https://digitalcommons.fiu.edu/etd/3896 https://digitalcommons.fiu.edu/cgi/viewcontent.cgi?article=5139&context=etd |
Summary: | Concurrency bugs are extremely hard to detect due to huge interleaving space. They are happening in the real world more often because of the prevalence of multi-threaded programs taking advantage of multi-core hardware, and microservice based distributed systems moving more and more applications to the cloud. As the most common non-deadlock concurrency bugs, atomicity violations are studied in many recent works, however, those methods are applicable only to single-variable atomicity violation, and don't consider the specific challenge in distributed systems that have both pessimistic and optimistic concurrency control. This dissertation presents a tool using model checking to predict atomicity violation concurrency bugs involving two shared variables or shared resources. We developed a unique method inferring correlation between shared variables in multi-threaded programs and shared resources in microservice based distributed systems, that is based on dynamic analysis and is able to detect the correlation that would be missed by static analysis. For multi-threaded programs, we use a binary instrumentation tool to capture runtime information about shared variables and synchronization events, and for microservice based distributed systems, we use a web proxy to capture HTTP based traffic about API calls and the shared resources they access including distributed locks. Based on the detected correlation and runtime trace, the tool is powerful and can explore a vast interleaving space of a multi-threaded program or a microservice based distributed system given a small set of captured test runs. It is applicable to large real-world systems and can predict atomicity violations missed by other related works for multi-threaded programs and a couple of previous unknown atomicity violation in real world open source microservice based systems. A limitation is that redundant model checking may be performed if two recorded interleaved traces yield the same partial order model. |
---|