Aims and Scopes

CIIDAA is a large scale, Comprehensive IT Infrastructure for Data-intensive Applications and Analysis. In the project, we aim to harness the power of cloud computing to solve Big Data problems in the real world. The goal is to provide a generic platform to which different cartridges can be plugged in for supporting different applications. To this end, we are investing various areas pertaining big data analysis and the cloud: computing framework, system architecture, performance, security, programming language, software engineering, databases and analytics.

Three main research challenges that CIIDAA faces are in the design and implementations of:

  • A scalable and elastic infrastructure for data-intensive computing.
  • A programming environment that isolates the programmer from low level details while supporting security and privacy.
  • Scalable analytics that leverage the infrastructure for extremely large data sets.

We will demonstrate the value of our system through multiple real-world applications. Specifically, we are focusing on applying CIIDAA for supporting predictive analytics in healthcare (in collaboration with NUHS) and in the area of web consumer (with Starhub).

Architectural Design

CIIDAA

CIIDAA comes with a comprehensive software stack aiming to support a wide range of data intensive applications and analysis. CIIDAA sits between the OS and user applications, providing multiple layers of services: storage, computation, security and monitoring.

  • Storage: Virtual Block Store provides a block abstraction on top of storage devices. Using blocks, higher-level storage abstractions are supported. HDFS organizes blocks into files, ES2 structures them into database records.
  • Computation: Numerous computing framework for large-scale, data intensive applications exist. MapReduce (Hadoop), Graph Processing (Pregel), for examples, target different groups of applications. We build E3 - an elastic, extensible execution engine capable of supporting and out-performing the existing frameworks. In particular, one can write Hadoop, Pregel jobs, and SQL queries using E3.
  • Security: When data is sensitive, security becomes paramount. We enhance the storage and computation layers with security guarantees, namely data confidentiality. The Trusted Data Service (TDS) allows users to encrypt their data and applications to run on encrypted data. Confidentiality is protected across the entire CIIDAA software stack.
  • Monitoring: It is essential to be able to track system performance along multiple metrics. To this end, PerfMon is provided as a cross-layer service that intercepts and measures system and RPC calls. The service correlates low-level, RPC measurement into high-level views such as application I/Os. Monitored performance is stored in a log, which can then be queried and analyzed through PerfMon interface.

Industry Collaborators

National University Health System (NUHS). We work closely with doctors and healthcare informatics experts at NUHS to develop wide range of clinical predictive analytics applications including optimizing patient flows, risk prediction,and treatment decision making.

Starhub is the major telecomunication company in the island with million customers and generating billion transactions per day. We are working with Starhub to build a comprehensive data analytics tool for better understanding customer needs, therefore helping the company improve customer services and marketing campaigns.

Support

We are thankful to the NRF for supporting our research