Overview

ForkBase is our attempt to build a storage system that supports high-level properties demanded in many modern applications. Particularly, ForkBase provides immutability, collaboration and security. The system enables rapid developments of many classes of scalable, distributed applications, thanks to its versatile programming interface, rich semantics and high performance.

Why another storage?

Existing data storage systems offer a wide range of functionalities to accommodate an equally diverse range of applications. However, new classes of applications have emerged, e.g., blockchains and collaborative analytics, presenting new opportunities for storage systems to efficiently support them. ForkBase stems from our observation of several trends in distributed data management applications, and targets unifying and adding value to many classes of today's application, while opening the door for future applications.

What does ForkBase offer?

The following figure illustrates how the storage unifies the common properties for different categories of applications, showing the core features offered in different layers.

architecture

Immutability and Versioning

Data versioning is an important concept in applications that require existing data immutable and keep track of data evolution history, in which any update made on the data results in a new copy (or version). Typical applications includes document hosting (e.g. Dropbox), software development (e.g. Github) and collaborative analytics (e.g. DataHub). Blockchain is another example, in which each block represents a version of the global states. ForkBase has a rich set of built-in data types (for both structured and unstructured data), providing immutability and versioning for stored data. It helps to reduce storage consumption and improve access efficiency when managing massive versions of data.

Collaboration and Fork Semantics

Many applications involving collaborations among different users demand fork sementics to let users work on independent copies of the data. Fork semantics elegantly captures the non-linearity of data evolution status, and can be divided into two categories: on-demand forks are found in applications that have explicit demand for isolated (or private) branches, such as Github and Datahub; on-conflict forks are used in applications that implicitly fork a state upon concurrent modifications of the same data, such as blockchains. In Bitcoin and Ethereum, forks arise when multiple blocks are mined simultaneously from an old block, which are resolved by taking the longest chain. ForkBase supports both fork semantics to facilitate rich types of collaboration workflows. It natively provides many built-in conflict resolution strategies for merging branches in various scenarios.

Security and Tamper-Evidence

Security conscious applications demand protection against malicious modifications, not only from external attackers but also from malicious insiders. One example is outsourced services like databases or file systems, which provide mechanisms to detect data tampering. Another example is blockchain platforms, which require tamper evidence for the ledger. All data objects in ForkBase are tamper-evident, and hence can be leveraged to build better data models for blockchains. In particular, the blockchain's data structures become easy to maintain without incurring any performance overhead. Furthermore, we note that there is an increasing demand for performing analytics on blockchain data, which existing blockchain storage engines were not designed for. The richer structured information captured in ForkBase makes the blockchain analytics-ready.