Database Group, School of Computing, National University of Singapore




Project Title:

BestPeer++: A Peer-to-Peer based Large-scale Data Processing Platform

Project Description:

The BestPeer++ project is the successor of the BestPeer project.

The original BestPeer system attempts to exploit peer-to-peer (P2P) technologies for distributed applications. BestPeer was designed to work as a scalable, sharable, and secure P2P-based Data Management system with full functionalities for building corporate networks in which a group of organizations controlled by different administrative domains collaborate with each other in order to reduce operation cost and improve productivity. Examples of corporate network applications include supply chain management and national healthcare network. BestPeer provides an effective and efficient way to share data belonging to different organizations and provide enterprise quality query facility, without the need to set up a huge centralized server.

As an in-time response to the ever changing business demands and the emergence of Cloud Computing techniques, BestPeer has evolved into its new stage of development -- the cloud-enabled BestPeer++ system.

By integrating cloud computing, database, and P2P technologies, BestPeer++ achieves its query processing efficiency in a pay-as-you-go manner and is a promising approach for corporate network applications.

Specifically, BestPeer++ is deployed as a service in the cloud. To form a corporate network, companies simply register their sites with the BestPeer++ service provider, launch BestPeer++ instances in the cloud and finally export data to those instances for sharing. BestPeer++ adopts the pay-as-you-go business model popularized by cloud computing. The total cost of ownership is therefore substantially reduced since companies do not have to buy any hardware/software in advance. Instead, they pay for what they use in terms of BestPeer++ instances hours and storage capacity. The BestPeer++ service provider elastically scales up the running instances and makes them always available. Notably, BestPeer++ employs a hybrid design for achieving high performance query processing. The major workload of a corporate network is simple, low-overhead queries. Such queries typically only involve querying a very small number of business partners and can be processed in short time. BestPeer++ is mainly optimized for these queries. For infrequent time-consuming analytical tasks, we provide an interface for exporting the data from BestPeer++ to Hadoop and allow users to analyze those data using MapReduce.

BestPeer++ also inherits its predecessor's nice features such as support for semi-automatic schema mapping and data mapping, efficient distributed query processing, effective system load balancing and other functionalities that a corporate network requires.