UNOdb: Towards UNcheatable Outsourced databases

Introduction

Projects

Members
Publications
 

 
Projects

The goal of our research is to devise authentication mechanisms that enable a user to verify the query results generated by an untrusted server. Our proposed mechanisms are designed to satisfy all the following objectives, of which the first four are security requirements while the last is intended to ensure that the overheads incurred by the mechanisms are acceptable in practice:

  • Authenticity – The user can check that all the values in a query result originated from the owner; they have not been tampered with, nor have spurious entries been introduced.
  • Completeness – The user can verify that all the data points that satisfy the conditions of a query are included in the result.
  • Precision – The ratio of the query result size over the total size of the data points and their attribute values returned; should be high.
  • Security – It is computationally infeasible for the publisher to cheat by generating a proof for an incorrect query result.
  • Efficiency – The procedure for the publisher to generate the proof for a correct (i.e., complete and authentic) query result has polynomial complexity. Likewise the procedure for the user to verify a query result has polynomial complexity.

In this research project, we will develop authentication schemes for various domains, ranging from traditional relational context (SPJ queries) to data warehousing (multi-dimensional queries), OLAP and data analysis applications (aggregate queries). We will also develop techniques for query execution assurances for complex and compute-intensive queries.

 

Relational Databases (SPJ Queries)

As a start, we will focus on designing a scheme for users to verify that their query results are complete (i.e., no qualifying tuples are omitted) and authentic (i.e., all the result values originated from the owner) in the context of relational databases. We will design a scheme that supports range selection, project, join as well as

multi-point queries on relational databases.

 

Data Warehousing/OLAP (Multidimensional Queries)

The value of a data warehouse lies in its ability to support business intelligence. These applications perform many multidimensional calculations to analyze data across dimensions. An example query given in the Oracle OLAP datasheet (http://www.oracle.com/technology/products/oracle9i/datasheets/olap.pdf) is “the top ten products for each of the top ten customers during a rolling six month time period based on growth in dollar sales”. In this query, a product ranking is nested within a customer ranking, and data is analyzed across a number of time periods and a virtual measure.

 

Figure 1: Data Warehousing Set-Up

In a deployment set-up, the data warehouse may be situated near the analytics users, away from the underlying production (OLTP) database(s) as illustrated in Figure 1. To offer faster response for large, distributed user communities, the data warehouse may even be replicated on platforms such as content delivery network, edge computing, computing grid, and peer-to-peer databases. Such a deployment scenario corresponds to the data publishing model, and consequently there is a need for the users to be able to authenticate the results of the multidimensional queries on the data warehouse(s).

 

 OLAP/Data Analysis (Aggregate Queries)

Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. These applications categorize data values and trends, extract statistical information, and then contrast one category with another. To perform such data analysis efficiently over large datasets, many of the analysis tools are built on top of aggregation operators supported by database management systems. The SQL aggregate operations and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Specifically, the SQL standard (IS9075 International Standard for Database Language SQL, 1992) specifies the following aggregate operators: COUNT( ), SUM( ), MIN( ), MAX( ), AVG( ). Among those aggregate operations, COUNT( ), MIN( ) and MAX( ) can be checked easily from the meta-data, while AVG( ) is derived from SUM( ) and COUNT( ). So we will focus on authentication for SUM( ) in our research. In addition, we will address the authentication of the data cube, a more powerful aggregate operator. As explained in [1], the cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers.

[1]     Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, Hamid Pirahesh, “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals”, Journal of Data Mining and Knowledge Discovery, 1(1), 1997, 29-53.

 

Query Execution Assurance

Ensuring completeness and authenticity for arbitrary query types remain an open hard problem. In such cases, we explore mechanism of runtime query “proofs” in a challenge-response protocol. For each batch of queries, the server is “challenged” to provide a proof of query execution (rather than on the answers returned) which is then checked by the client as a prerequisite to accepting the actual query results as accurate. This scenario is especially useful for compute-intensive applications where servers may cheat for economical reasons (claiming to have performed a compute-intensive operation without actually doing it). We plan to study this in the context of the expensive data mining process.