Database Group, School of Computing, National University of Singapore

 

 

Figure 1. Architecture of BestPeer

In the last two decades we have witnessed a growing need for information sharing. With more affordable commodity hardwares and higher communication bandwidth, data sharing has become an exciting killer application of the Internet.

Peer-to-peer (P2P) architecture aims at extending the current distributed computing design to accommodate dynamic resources such as information and computing power. In such environments, the peers are autonomous with highly dynamic behaviors and act as servers and consumers at the same time. These characteristics bring many opportunities for exploitation but with technical challenges to be solved as well.

About BestPeer

BestPeer project has two stages. In this first stage, we built an unstructured P2P network based on agent model. A number of applications were tested on that platform such as PeerDB (for database application), PeerIS (for Information Retrieval), BuddyWeb (for collaborative web caching).  An open source version of BestPeer can be downloaded from here.

In the second stage, BestPeer is enhanced to a scalable, sharable, and secure P2P-based Data Management system with full functionalities for building corporate network applications such as supply chain management and national healthcare network. As shown in Figure 1, BestPeer is designed to support enterprise applications. It builds a corporate network by linking companies via a structured overlay (BATON). Each company acts as a node in BestPeer and exports a portion of its local data for sharing with other companies. From the view of a user, BestPeer can be considered as a new data sharing platform for enterprise applications.

Specifically, BestPeer V2.0 supports:

       Semi-Automatic Schema Mapping: To share data with others, the company needs to map its schema to the global schema. A machine learning algorithm is employed to help the manager to establish the mapping relations.

       Incremental Data Integration: Once the mapping relations are set up, BestPeer automatically and periodically exports data from local databases of participating companies to BestPeer data sharing platform.

        Efficient Query Processing: A distributed query plan is generated and forwarded to multiple processing nodes, where the query is processed in parallel. In addition, to support analytic queries that aim to provide timely summarized statistics for decision making, a distributed online aggregation scheme is developed to iteratively and progressively produce approximate aggregate results for users.

       Data security and privacy: The messages sent between nodes in BestPeer are encrypted to increase the security level of the system. Furthermore, access to the data shared in BestPeer corporate network is controlled by a distributed role-based access control scheme to protect local data of each node from malicious users.

        Intelligent Replication: BestPeer provides an always-on service, in that node failures do not affect the availability of data. To achieve this goal, an intelligent replication strategy driven by the system runtime workload is applied to replicate data across the nodes for data availability and load balancing.

       Analytic Tool: BestPeer software runs as a backend service in each node. The users can access the service via web interfaces, which increases the usability of the service. BestPeer generates graphs (e.g., bar-graph and pie-graph) for query results to facilitate decision making.

       Cloud Support: BestPeer is now cloud enabled. By integrating cloud computing, database, and P2P technologies, BestPeer achieves its query processing efficiency in a pay-as-you-go manner and is a promising approach for corporate network applications. More details of our cloud solution can be found on BestPeer Ltd.'s website.