epiC: elastic power-aware data intensive Cloud
Due to its natural parallelism and fault tolerance, the MapReduce model is widely used in large-scale data processing systems. However, compared to database systems, MapReduce-based systems lacks schema support, high-level language and state-of-the-art query optimization. Extending database systems to the cloud platform is an essential requirement for many applications, such as community systems, multi-tenant service, etc. The designs and principles of conventional database systems cannot be directly applied to the new platform. Therefore, in this project, we propose and implement a new framework, epiC, to provide scalable database service on Cloud.
The goal of the epiC project is to develop an elastic, power-aware, data-intensive cloud computing platform for large-scale services. Two typical database workload types are supported in epiC: data intensive analytical jobs and online-transcations. As its name indicates, epiC is specially designed as a new platform on the Cloud. The property of elasticity guarantees that epiC can scale up or down in a pay-as-you-go model. With this feature, epiC provides resizable computation capacity according to different users' computational requirements. Previous work on the Cloud focuses on the scalability of computation but ignores the scalability of economics and the environment. Energy consumption in data centers plays a crucial part in maintenance cost. Cloud service providers should wisely plan their budget before deploying a system. To facilitate energy-aware computation and ensure effectiveness and reliability, the epiC system is designed to monitor system status and reschedule jobs where necessary.
In our approach, we try to break down conventional database operations such as join into some primitive ones, and make them run like MapReduce-sort of phases (filter-and-refine). Meanwhile, cost models are defined for the cloud environment, by which the amount of power consumption and other important cost statistics can be accurately estimated, thus helping the system to reduce energy consumption.
The epiC system provides a number of powerful features for building reliable, enterprise class applications, including:
- Elastic Scaling: In epiC, the system provides resizable computation capacity. It allows applications to quickly scale their computing capacity, upward or downward, depending on needs. This pay-as-you-go model guarantees that the number of epiC nodes automatically scales up during demand peak to maintain performance, and scales down during slack hours to minimize cost.
- Power-aware Scheduling: As the allocation of jobs and scheduling of tasks in the Cloud could significantly affect overall performance, scheduling should not only happen at the beginning of tasks, but also be dynamically called at runtime. For example, by knowing the current power consuming condition, the system can spontaneously re-allocate jobs to idle nodes or to the cooler rack, thus reducing the electricity used for cooling machines.
- Flexible Storage: The epiC system allows users to create storage volumes that can be mounted as devices. A newly created volume can be attached to any epiC node. Once attached, it will appear as a mounted device similar to any hard drive or other block device. All the volumes attached to an epiC node can function like local drives. Interactions between nodes will be efficient.
- Automatic Load Balancing: The epiC system will automatically distribute incoming traffic across a user's available nodes. The epiC Job Watcher will keep track of the status of all the nodes, detect unhealthy nodes and route incoming traffic to healthy ones.
Contact: epiC Web Team
