MapReduceDB: A Scalable and Efficient Data Processing System

Introduction
Publications
People
Contact

Introduction

Recently, MapReduce is introduced as an alternative approach to large-scale data processing. The attrativeness of MapReduce include good fault tolerance, flexibility and extensibility for expressing arbitrarily complex logic, and a free open source implementation, i.e., Hadoop. The main issue of MapReduce is performance. Compared to the conventional parallel DBMS, the query performance of MapReduce is significantly slower as its original design is not targeted at database workload.

The goal of MapReduceDB is to build a high performance in-database data processing system with query performance approaching to that of parallel DBMS while, on the same time, retaining all the merits of MapReduce, namely excellent scalability, good fault tolerance, and ability of expressing arbitrary data analysis logic. We are benchmarking MapReduce and investigating various techniques to reduce the runtime overhead introduced by MapReduce framework in handling database workload. We are also studying novel data storage, indexing, and query optimization strategies for MapReduce like framework.

This project is funded by Amazon Academic Research Grant.




Papers and Technical Reports




People

PIs:

Members:




Contact

E-mail Ooi Beng Chin, David Jiang, Wu Sai, or Lin Yuting ({ooibc, jiangdw, wusai, lin36} AT comp.nus.edu.sg) for questions or comments.


MapReduceDB Team - National University of Singapore 2009     Last update: 07-30-2009