| Introduction |
| Publications |
| People |
| Contact |
Recently, MapReduce is introduced as an alternative approach to large-scale data processing. The attrativeness of MapReduce include good fault tolerance, flexibility and extensibility for expressing arbitrarily complex logic, and a free open source implementation, i.e., Hadoop. The main issue of MapReduce is performance. Compared to the conventional parallel DBMS, the query performance of MapReduce is significantly slower as its original design is not targeted at database workload.
The goal of MapReduceDB is to build a high performance in-database data processing system with query performance approaching to that of parallel DBMS while, on the same time, retaining all the merits of MapReduce, namely excellent scalability, good fault tolerance, and ability of expressing arbitrary data analysis logic. We are benchmarking MapReduce and investigating various techniques to reduce the runtime overhead introduced by MapReduce framework in handling database workload. We are also studying novel data storage, indexing, and query optimization strategies for MapReduce like framework.
This project is funded by Amazon Academic Research Grant.