Program Validation by Symbolic and Reverse Execution

Bounded Lazier Initialization for Reasonably Terminating Symbolic Execution and

Dynamic Reverse-Code Generation for Memory-Efficient Reverse Execution

by Jooyong Lee


Program validation is one of the most crucial tasks during program development since programs should conform to programmers' requirements. (We approach validation from programmers' perspective rather than from customers' perspective.) To this end, one is often required to formulate requirements into formal specifications and analyze a given program against these specifications until no error is detected; if an error is detected, its cause must be located and fixed.

In this dissertation, we present applications of symbolic execution and reverse execution that can help program validation; symbolic execution is used to detect errors and reverse execution is used to locate errors.

We review symbolic execution used for program testing and proving to recall two main difficulties in performing symbolic execution, namely (1) handling heap-allocated pointer-data structures and (2) dealing with a potentially infinite number of symbolic execution paths. We extend symbolic execution to address these problems. To handle heap-allocated pointer-data structures, ``lazier initialization'', which is an improvement over lazy initialization, is introduced and compared with the existing methods. To deal with a potentially infinite number of symbolic execution paths without requiring loop invariants, the size of input is bounded during symbolic execution. In particular, an algorithm to bound the size of arbitrary pointer-data structures is combined with lazier initialization. This bounded lazier initialization helps symbolic execution to terminate, providing a user with relatively high confidence in a given program.

Meanwhile, we also review reverse execution used for program debugging and the backtracking of explicit-state model checking to observe that reverse execution has been evolved to reduce its memory usage. We point out that the most memory-efficient way to perform reverse execution, the reverse-code generation method based on static analysis, may not be applicable to non-deterministic programs such as multi-threaded programs. We introduce our own reverse-code generation method based on dynamic analysis to address this problem.

We provide the experimental result to show that our bounded lazier initialization is fast enough to detect all possible errors existing within a bounded domain. We also provide a case study of our dynamic reverse-code generation method to show its memory-efficiency.