WYSINWYX: What You See Is Not What You Execute
Loading...
Files
Date
Authors
Balakrishnan, Gogul
Advisors
License
DOI
Type
Technical Report
Journal Title
Journal ISSN
Volume Title
Publisher
University of Wisconsin-Madison Department of Computer Sciences
Grantor
Abstract
There is an increasing need for tools to help programmers and security
analysts understand executables. For instance, commercial companies
and the military increasingly use Commercial Off-The Shelf (COTS)
components to reduce the cost of software development. They are
interested in ensuring that COTS components do not perform malicious
actions (or can be forced to perform malicious actions). Viruses and
worms have become ubiquitous. A tool that aids in understanding their
behavior can ensure early dissemination of signatures, and thereby
control the extent of damage caused by them. In both domains, the
questions that need to be answered cannot be answered perfectly---the
problems are undecidable---but static analysis provides a way to
answer them conservatively.
In recent years, there has been a considerable amount of research
activity to develop analysis tools to find bugs and security
vulnerabilities. However, most of the effort has been on analysis of
source code, and the issue of analyzing executables has largely been
ignored. In the security context, this is particularly unfortunate,
because performing analysis on the source code can fail to detect
certain vulnerabilities due to the WYSINWYX phenomenon: "What You
See Is Not What You eXecute". That is, there can be a mismatch
between what a programmer intends and what is actually executed on the
processor.
Even though the advantages of analyzing executables are appreciated
and well-understood, there is a dearth of tools that work on
executables directly. The overall goal of our work is to develop
algorithms for analyzing executables, and to explore their
applications in the context of program understanding and automated bug
hunting. Unlike existing tools, we want to provide useful information
about memory accesses, even in the absence of debugging
information. Specifically, the dissertation focuses on the following
aspects of the problem:
- Developing algorithms to extract intermediate representations (IR)
from executables that are similar to the IR that would be obtained
if we had started from source code. The recovered IR should be
similar to that built by a compiler, consisting of the following
elements: (1) control-flow graphs (with indirect jumps resolved),
(2) a call graph (with indirect calls resolved), (3) the set of
variables, (4) values of pointers, (5) sets of used, killed, and
possibly-killed variables for control-flow graph nodes, (6) data
dependences, and (7) types of variables: base types, pointer
types, structs, and classes.
- Using the recovered IR to develop tools for program understanding
and for finding bugs and security vulnerabilities.
The algorithms described in this dissertation are incorporated in a
tool we built for analyzing Intel x86 executables, called
CodeSurfer/x86.
Because executables do not have a notion of variables similar to the
variables in programs for which source code is available, one of the
important aspects of IR recovery is to determine a collection of
variable-like entities for the executable. The quality of the
recovered variables affects the precision of an analysis that gathers
information about memory accesses in an executable, and therefore, it
is desirable to recover a set of variables that closely approximate
the variables of the original source-code program. On average, our
technique is successful in identifying correctly over 88% of the local
variables and over 89% of the fields of heap-allocated objects. In
contrast, previous techniques, such as the one used in the IDAPro
disassembler, recovered 83% of the local variables, but 0% of the
fields of heap-allocated objects.
Recovering useful information about heap-allocated storage is another
challenging aspect of IR recovery. We propose an abstraction of
heap-allocated storage called recency-abstraction, which is somewhere
in the middle between the extremes of one summary node per malloc site
and complex shape abstractions. We used the recency-abstraction to
resolve virtual-function calls in executables obtained by compiling
C++ programs. The recency-abstraction enabled our tool to discover the
address of the virtual-function table to which the virtual-function
field of a C++ object is initialized in a substantial number of
cases. Using this information, we were able to resolve, on average,
60% of the virtual-function call sites in executables that were
obtained by compiling C++ programs.
To assess the usefulness of the recovered IR in the context of bug
hunting, we used CodeSurfer/x86 to analyze device-driver executables
without the benefit of either source code or symbol-table/debugging
information. We were able to find known bugs (that had been discovered
by source-code analysis tools), along with useful error traces, while
having a low false-positive rate.
Description
Keywords
Related Material and Data
Citation
TR1603