Gregory J. Duck
Research Assistant Professor
National University of Singapore
Office: COM2-03-33
Email:
Gregory is a Research Assistant Professor at the National University of Singapore.
Previously, Gregory received his BSc (Mathematics), BEng (Software) and Phd (Computer Science) from the University of Melbourne. From 2011 Gregory works at the National University of Singapore.
Gregory's research interests include: systems, security, binary rewriting, fuzzing, repair, and programming languages.
Publications:
Abstract:
Memory errors continue to be a critical concern for programs written in low-level programming languages such as C and C++. Many different memory error defenses have been proposed, each with varying trade-offs in terms of overhead, compatibility, and attack resistance. Some defenses are highly compatible but only provide minimal protection, and can be easily bypassed by knowledgeable attackers. On the other end of the spectrum, capability systems offer very strong (unforgeable) protection, but require novel software and hardware implementations that are incompatible by definition. The challenge is to achieve both very strong protection and high compatibility.
In this paper, we propose Fully Randomized Pointers (FRP) as a strong memory error defense that also maintains compatibility with existing binary software. The key idea behind FRP is to design a new pointer encoding scheme that allows for the full randomization of most pointer bits, rendering even brute force attacks impractical. We design a FRP encoding that is: (1) compatible with existing binary code (recompilation not needed); and (2) decoupled from the underlying object layout. FRP is prototyped as: (i) a software implementation (BlueFat) to test security and compatibility; and (ii) a proof-of-concept hardware implementation (GreenFat) to evaluate performance. We show FRP is secure, practical, and compatible at the binary level, while our hardware implementation achieves low performance overheads (<4%).
Abstract:
Computer programs are not executed in isolation, but rather interact with the execution environment which drives the program behaviors. Software validation methods thus need to capture the effect of possibly complex environmental interactions. Program environments may come from files, databases, configurations, network sockets, human-user interactions, and more. Conventional approaches for environment capture in symbolic execution and model checking employ environment modeling, which involves manual effort. In this paper, we take a different approach based on an extension of greybox fuzzing. Given a program, we first record all observed environmental interactions at the kernel/usermode boundary in the form of system calls. Next, we replay the program under the original recorded interactions, but this time with selective mutations applied, in order to get the effect of different program environments—all without environment modeling. Via repeated (feedback-driven) mutations over a fuzzing campaign, we can search for program environments that induce crashing behaviors. Our EnvFuzz tool found 33 previously unknown bugs in well-known real-world protocol implementations and GUI applications. Many of these are security vulnerabilities and 16 CVEs were assigned.
Abstract:
Memory errors, such as buffer overflows and use-after-free, remain the root cause of many security vulnerabilities in modern software. The use of closed source software further exacerbates the problem, as source-based memory error mitigation cannot be applied. While many memory error detection tools exist, most are based on a single error detection methodology with resulting known limitations, such as incomplete memory error detection (redzones) or false error detections (low-fat pointers). In this paper we introduce RedFat, a memory error hardening tool for stripped binaries that is fast, practical and scalable. The core idea behind RedFat is to combine complementary error detection methodologies---redzones and low-fat pointers---in order to detect more memory errors that can be detected by each individual methodology alone. However, complementary error detection also inherits the limitations of each approach, such as false error detections from low-fat pointers. To mitigate this, we introduce a profile-based analysis that automatically determines the strongest memory error protection possible without negative side effects.We implement RedFat on top of a scalable binary rewriting framework, and demonstrate low overheads compared to the current state-of-the-art. We show RedFat to be language agnostic on C/C++/Fortran binaries with minimal requirements, and works with stripped binaries for both position independent/dependent code. We also show that the RedFat instrumentation can scale to very large/complex binaries, such as Google Chrome.
Abstract:
Greybox fuzzing is an effective method for software testing. Greybox fuzzers, such as AFL, use instrumentation that collects path coverage information in order to guide the fuzzing process. The instrumentation is usually inserted by a modified compiler toolchain, meaning that the program must be recompiled in order to be compatible with greybox fuzzing. When source code is unavailable, or for projects with complex build systems, recompilation is not always feasible. In this paper, we present E9AFL, a fast and scalable tool that automatically inserts AFL instrumentation to program binaries. E9AFL is built on top of the E9Patch static binary rewriting tool. To combat the overhead caused by binary instrumentation, E9AFL develops a set of optimization strategies. Our evaluation results show that E9AFL outperforms existing binary instrumentation tools and achieves comparable performance with the compile time instrumentation.
Abstract:
Static binary rewriting has many important applications in software security and systems, such as hardening, repair, patching, instrumentation, and debugging. While many different static binary rewriting tools have been proposed, most rely on recovering control flow information from the input binary. The recovery step is necessary since the rewriting process may move instructions, meaning that the set of jump targets in the rewritten binary needs to be adjusted accordingly. Since the static recovery of control flow information is a hard problem in general, most tools rely on a set of simplifying heuristics or assumptions, such as specific compilers, specific source languages, or binary file meta information. However, the reliance on assumptions or heuristics tends to scale poorly in practice, and most state-of-the-art static binary rewriting tools cannot handle very large/complex programs such as web browsers.In this paper we present E9Patch, a tool that can statically rewrite x86_64 binaries without any knowledge of control flow information. To do so, E9Patch develops a suite of binary rewriting methodologies---such as instruction punning, padding, and eviction---that can insert jumps to trampolines without the need to move other instructions. Since this preserves the set of jump targets, the need for control flow recovery and related heuristics is eliminated. As such, E9Patch is robust by design, and can scale to very large (>100MB) stripped binaries including the Google Chrome and FireFox web browsers. We also evaluate the effectiveness of E9Patch against realistic applications such as binary instrumentation, hardening and repair.
Abstract:
Malformed data-structures can lead to runtime errors such as arbitrary memory access or corruption. Despite this, reasoning over data-structure properties for low-level heap manipulating programs remains challenging. In this paper we present a constraint-based program analysis that checks data-structure integrity, w.r.t. given target data-structure properties, as the heap is manipulated by the program. Our approach is to automatically generate a solver for properties using the type definitions from the target program. The generated solver is implemented using a Constraint Handling Rules (CHR) extension of built-in heap, integer and equality solvers. A key property of our program analysis is that the target data-structure properties are shape neutral, i.e., the analysis does not check for properties relating to a given data-structure graph shape, such as doubly-linked-lists versus trees. Nevertheless, the analysis can detect errors in a wide range of data-structure manipulating programs, including those that use lists, trees, DAGs, graphs, etc. We present an implementation that uses the Satisfiability Modulo Constraint Handling Rules (SMCHR) system. Experimental results show that our approach works well for real-world C programs.
Abstract:
Low-level programming languages with weak/static type systems, such as C and C++, are vulnerable to errors relating to the misuse of memory at runtime, such as (sub-)object bounds overflows, (re)use-after-free, and type confusion. Such errors account for many security and other undefined behavior bugs for programs written in these languages. In this paper, we introduce the notion of dynamically typed C/C++, which aims to detect such errors by dynamically checking the "effective" type of each object before use at runtime. We also present an implementation of dynamically typed C/C++ in the form of the Effective Type Sanitizer (EffectiveSan). EffectiveSan enforces type and memory safety using a combination of low-fat pointers, type meta data and type/bounds check instrumentation. We evaluate EffectiveSan against the SPEC2006 benchmark suite and the Firefox web browser, and detect several new type and memory errors. We also show that EffectiveSan achieves high compatibility and reasonable overheads for the given error coverage. Finally, we highlight that EffectiveSan is one of only a few tools that can detect sub-object bounds errors, and uses a novel approach (dynamic type checking) to do so.
Abstract:
Object bounds overflow errors are a common source of security vulnerabilities. In principle, bounds check instrumentation eliminates the problem, but is hampered by limited compatibility against un-instrumented code and high overheads. On 64-bit systems, low-fat pointers are a recent scheme for implementing efficient and compatible bounds checking by transparently encoding meta information within the native pointer representation itself. However, low-fat pointers are traditionally used for heap objects only, where the allocator has sufficient control over object location necessary for the encoding. This is a problem for stack allocation, where there exist strong constraints regarding the location of stack objects that is apparently incompatible with low-fat pointers. In this paper, we present an extension of low-fat pointers to stack objects by using a collection of techniques, such as pointer mirroring and memory aliasing, thereby allowing stack objects to enjoy bounds error protection from instrumented code. Our extension is compatible with common special uses of the stack, such as alloca, setjmp and longjmp, exceptions, and multi-threading, which rely on direct manipulation of the stack pointer. Our experiments show that we successfully extend the advantages of the low-fat pointer encoding to stack objects. The end result is competitive bounds checking instrumentation for the stack and heap with low memory and runtime overheads, and high compatibility with un-instrumented legacy code.
Abstract:
Heap buffer overflow (underflow) errors are a common source of security vulnerabilities. One prevention mechanism is to add object bounds meta information and to instrument the program with explicit bounds checks for all memory access. The so-calledfat pointersapproach is one method for maintaining and propagating the meta information where native machine pointers are replaced withfatobjects that explicitly store object bounds. Another approach islow fat pointers, which encodes meta information within a native pointer itself, eliminating space overheads and also code compatibility issues. This paper presents a new low fat pointer encoding that is fully compatible with existing libraries (e.g. pre-compiled libraries unaware of the encoding) and standard hardware (e.g. x86_64). We show that our approach has very low memory overhead, and competitive with existing state-of-the-art bounds instrumentation solutions.
Abstract:
We consider the application of Constraint Handling Rules (CHR) for the specification of type inference systems, such as that used by Haskell. Confluence of CHR guarantees that the answer provided by type inference is correct and consistent. The standard method for establishing confluence relies on an assumption that the CHR program is terminating. However, many examples in practice give rise to non-terminating CHR programs, rendering this method inapplicable. Despite no guarantee of termination or confluence, the Glasgow Haskell Compiler (GHC) supports options that allow the user to proceed with type inference anyway, e.g. via the use of the UndecidableInstances flag. In this paper we formally identify and verify a set of relaxed criteria, namely range-restrictedness, local confluence, and ground termination, that ensure the consistency of CHR-based type inference that maps to potentially non-terminating CHR programs.
Abstract:
This paper introduces a constraint language H for finite partial maps (a.k.a. heaps) that incorporates the notion of separation from Separation Logic. We use H to build an extension of Hoare Logic for reasoning over heap manipulating programs using (constraint-based) symbolic execution. We present a sound and complete algorithm for solving quantifier-free (QF) H-formulae based on heap element propagation. An implementation of the H-solver has been integrated into a Satisfiability Modulo Theories (SMT) framework. We experimentally evaluate the implementation against Verification Conditions (VCs) generated from symbolic execution of large (heap manipulating) programs. In particular, we mitigate the path explosion problem using subsumption via interpolation -- made possible by the constraint-based encoding.
Abstract:
Satisfiability Modulo Constraint Handling Rules (SMCHR) is the integration of the Constraint Handling Rules (CHRs) solver programming language into a Satisfiability Modulo Theories (SMT) solver framework. Constraint solvers are implemented in CHR as a set of high-level rules that specify the simplification (rewriting) and constraint propagation behaviour. The traditional CHR execution algorithm manipulates a global store representing a flat conjunction of constraints. This paper introduces SMCHR: a tight integration of CHR with a modern Boolean Satisfiability (SAT) solver. Unlike CHR, SMCHR can handle (quantifier-free) formulae with an arbitrary propositional structure. SMCHR is essentially a Satisfiability Modulo Theories (SMT) solver where the theory T is implemented in CHR.
Abstract:
Constraint Handling Rules (CHRs) are a high-level rule-based programming language for specification and implementation of constraint solvers. CHR manipulates a global store representing a flat conjunction of constraints. By default, CHR does not support goals with a more complex propositional structure including disjunction, negation, etc., or CHR relies on the host system to provide such features. In this paper we introduce Satisfiability Modulo Constraint Handling Rules (SMCHR): a tight integration of CHR with a modern Boolean Satisfiability (SAT) solver for quantifier-free formulae with an arbitrary propositional structure. SMCHR is essentially a Satisfiability Modulo Theories (SMT) solver where the theory T is implemented in CHR. The execution algorithm of SMCHR is based on lazy clause generation, where a new clause for the SAT solver is generated whenever a rule is applied. We shall also explore the practical aspects of building an SMCHR system, including extending a "built-in" constraint solver supporting equality with unification and justifications.
Other: