UStore is our attempt to build a storage system which supports three high-level properties commonly found in many of today's distributed systems. Particularly, UStore provides immutability, data sharing and security properties to the upper-layer applications. The system enables rapid developments of many classes of scalable, distributed applications, thanks to its versatile programming interface, its rich semantics and its high performance.
Why another storage?
UStore stems from our observation of several trends in distributed data management applications, and of new advancement in hardware. Thus, we envision a storage system which unifies and adds value to many classes of today's application, while opening the door for future applications. By carefully leveraging new hardware primitives, we are able to make the storage and the overlaying applications highly efficient and scalable.
In many of today's applications, data is immutable, i.e. changes made to a piece of data result in a new version. One example is Git, the popular collaborative tool for source code's version control, which keeps tracks of all commits ever made. The second example is HDFS and RDD, the two popular storage systems underlying Hadoop and Spark. Distributed computations on Hadoop and Spark assume data does not change, i.e. they are immutable. The third example is data versioning, a special incarnation of immutability, which is the basis for many concurrency control techniques in databases systems.
The exponential growth of data brings the need for effective data sharing to the foreground. While social networks epitomizes the data sharing economy, there exists many other systems optimized for sharing different kind of data with different performance goals. For example, early P2P systems allows for file sharing, optimizing for bandwidth and availability. Dropbox, GoogleDocs, Gits are systems with collaborative views for files, documents and source code. Recent proposals, such as Datahub, allows data scientist to collaborate on analytics tasks.
Protecting data confidentiality remains a challenge in today's systems. Examples of recent data breaches (NSA leak, Sony hack, etc.) attributing the cause to insider threats have heightened the demands for confidentiality. Different solutions make different trade-off between performance and security, for different application settings. For example, CryptDb protects databases, M2R protects MapReduce, SUNDR protects file system operations. Recently, systems based on Blockchain, such as Bitcoin and Etherum, add integrity protection to distributed applications.
Advances in hardware are fast changing the commodity processor scene. First, the increase of memory capacity has given rise to many in-memory systems, such as Spark, H-Store, SAP HANA, RAMCloud, etc. Second, technologies such as NUMA, SIMD and HTM are being exploited to address data starvation, to increase parallelism, and to raise throughputs of transactional systems. Finally, advances in networking technologies such as RDMA and data center topology lead to thorough re-design of many storage systems. Examples are HERD, Farm, FDS.
Inspired by the software and hardware trends, we design UStore with four high-level goals:
- Rich semantics:
UStore provides data immutability, data sharing, and data security.
UStore APIs give its applications freedom to configure and combine the three properties.
- Efficient and scalable:
UStore leverages new hardware primitives to improve efficiency and scalability.
- Ease of development:
UStore APIs enable rapid development of existing and new application with little effort.
The design of UStore adheres to the three principles
- Layered design with narrow waist:
UStore shares the same design philosophy as the TCP/IP networking stack. It consists of multiple layers with little dependency between them, and new layer can be built and operate seamlessly with lower layers.
- End-to-end (applications know best):
UStore lets the applications specify and push down their semantics to the storage. It does not restrict semantics (except for the ones related to immutability, sharing and security).
- Scalability first:
When faced with a choice between scalability and another system property (except for the three embedded properties), UStore goes for scalability.