This project is a simple Garbage Collector (GC) for C programs running on 64-bit systems. Just like the famous Boehm collector, our GC is conservative, where the collector assumes that all bit patterns resembling pointers are indeed pointers. C programmers can replace calls to malloc() with the corresponding call to gc_malloc() and remove all calls to free() (and any deallocation logic) altogether. Unlike the Boehm collector, our GC design and implementation is very simple. The entire collector fits into a single file gc.c and is around ~1300 lines of code. The simplicity of the design leads to a richer memory API, which enables some interesting new applications...
Our GC design is based on overcommit memory. The basic idea is as follows: when a program allocates a large chunk of memory from the OS, the OS need not ensure there is enough physical memory to satisfy the request. Instead, the OS reserves a suitably sized chunk of the virtual address space and returns a pointer to the start of the chunk. Only when the process starts using the memory (i.e. reading/writing, etc.) does the OS associate allocate physical pages to the virtual addresses. As a consequence of this design, it is actually possible to "allocate" much more memory that is physically available (including swap).
The design of our GC takes advantage of this. When the program starts, the GC "allocates" (via mmap) a HUGE chunk of memory from the OS. The current implementation currently requests 4096GB (!) of memory up front. This is possible on 64-bit systems where the virtual address space is big enough to accommodate such requests. (This trick does not work on 32-bit systems -- where the maximum virtual address space size of 4GB is too small.)
Once allocated, the memory chuck is subdivided into smaller chunks (a.k.a. "buckets"), where each bucket handles allocation of a specific size range. The allocation algorithm for each bucket is very simple: either allocate from a "free-list" (if available from a previous collection), or simply allocate fresh space from the appropriate bucket by incrementing a pointer. Usually only a tiny fraction of the reserved 4096GB is ever touched, corresponding to the actual memory usage of the program.
The GC API comes with (the GC versions of) standard allocation functions:
There are some caveats that should be considered before using this GC.
Currently the GC (and sample programs) are released under the terms of the GNU General Public License v3:
The memory allocation algorithm (described above) forms the basis of our work on program hardening with low fat pointers. The basic idea is to use the efficient size and base operations (GC_size/GC_base) to lookup object bounds meta information at runtime. This bounds meta information can then be used to check for object bounds errors such as buffer overflows. This work has resulted in two publications so far:
Note that the implementation evaluated by these papers is different from the current GC code.