Abstraction Recovery for Scalable Static Binary Analysis

Many source code tools help software programmers analyze programs as they are being developed, but such tools can no longer be applied once the final programs are shipped to the user. This greatly limits users, security experts, and anyone other than the programmer who wishes to perform additional t...

Full description

Bibliographic Details
Main Author: Schwartz, Edward J.
Format: Others
Published: Research Showcase @ CMU 2014
Subjects:
Online Access:http://repository.cmu.edu/dissertations/336
http://repository.cmu.edu/cgi/viewcontent.cgi?article=1336&context=dissertations
Description
Summary:Many source code tools help software programmers analyze programs as they are being developed, but such tools can no longer be applied once the final programs are shipped to the user. This greatly limits users, security experts, and anyone other than the programmer who wishes to perform additional testing and program analysis. This dissertation is concerned with the development of scalable techniques for statically analyzing binary programs, which can be employed by anyone who has access to the binary. Unfortunately, static binary analysis is often more difficult than static source code analysis because the abstractions that are the basis of source code programs, such as variables, types, functions, and control flow structure, are not explicitly present in binary programs. Previous approaches work around the the lack of abstractions by reasoning about the program at a lower level, but this approach has not scaled as well as equivalent source code techniques that use abstractions. This dissertation investigates an alternative approach to static binary analysis which is called abstraction recovery. The premise of abstraction recovery is that since many binaries are actually compiled from an abstract source language which is more suitable for analysis, the first step of static binary analysis should be to recover such abstractions. Abstraction recovery is shown to be feasible in two real-world applications. First, C abstractions are recovered by a newly developed decompiler. The second application recovers gadget abstractions to automatically generate return-oriented programming (ROP) attacks. Experiments using the decompiler demonstrate that recovering C abstractions improves scalability over low-level analysis, with applications such as verification and detection of buffer overflows seeing an average of 17× improvement. Similarly, gadget abstractions speed up automated ROP attacks by 99×. Though some binary analysis problems do not lend themselves to abstraction recovery because they reason about low-level or syntactic details, abstraction recovery is an attractive alternative to conventional low-level analysis when users are interested in the behavior of the original abstract program from which a binary was compiled, which is often the case.