School of Computing

Department of Computer Science

CS3223:   Database Systems Implementation

Semester 2, 2022/2023  

 


[Announcements]  [Instructor] [Course Objectives] [Lecture Schedule] [Reference Texts and Materials] [Assignments (Tutorials & Gradiance] [Project] [Assessment]


The SimpleDB Database System

This project is based on SimpleDB, a multi-user transactional database server written in Java. SimpleDB is developed by Edward Sciore (Boston College), and the code is an integral part of a textbook Database Design and Implementation (Second Edition) he published with Springer. You should be able to access an e-copy from the NUS Library.

Unlike a full-fledged DBMS like MySQL or PostgreSQL, SimpleDB is intended as a teaching tool to facilitate the learning experience of a database system internal course. As such, the system implements only the basic functionalities of a “complete” database system. For example, it supports a very limited subset of SQL (and JDBC) and algorithms, offers little or no error checking, and is not designed for optimal performance/efficiency. However, the code is very well structured so that it is relatively easy to learn, use and extend SimpleDB.

Goals

This project will focus primarily on query processing (and related topics, e.g., parser, indexing, etc) for a single user. [We will not consider disk/memory management, transaction management (concurrency control and logging) and failures (recovery management), and we will also not consider multi-user setting.] You will gain a feel for how query processing works in a “real” system. You will also hopefully see how different query execution trees have different performance results, which will provide some motivation for query optimization.  (Note that the differences in your simple query processing system will probably be minor, because you will be using “toy” data sets and an execution system that ignores many of the complex aspects of a real system.)

 

Specifications

This project will be a three-person group project, so you should form your team ASAP.  Each group will do the followings structured in the form of weekly labs/assignments. Note that in the following assignments, detail guidelines have been provided for Lab 0 to Lab 4. From Lab 5 onwards, it is as stated (no guidelines given).

 

1.     Week 1 – Lab 0: Set up SimpleDB & create a student database

2.     Week 2 – Lab 1: Support for non-equality predicates (Due: 28 Jan 2022)

3.     Week 3 – Lab 2: Support for hash index and B+-tree index (Due: 7 Feb 2022)

4.     Week 4 – Lab 3: Support for order by clause and sorting (14 Feb 2022)

5.     Week 5 – Lab 4: Support for nested-loops join, sort-merge join and index-based join (21 Feb 2022)

6.     Week 6 – Lab 5: Support for partition-based (or hash) join, and aggregates (SUM, COUNT, AGV, MIN, MAX) with/without group by clause (using sort-based implementation of group by operator)

7.     Recess week

8.     Week 7 – Lab 6: Support for DISTINCT and displaying the query plan  

9.     Week 8 – Lab 7: Integrate all features into SimpleDB+

10.  Week 9 – Submit report/code and arrange for demo (Due: 18 Mar 2022)

11.  Week 10 – Demo week

 

 

Extensions (Bonus)

 

For those who want something more challenging, you can try the following extensions to SimpleDB (for those who do not do the following, you will not be penalized, but bonus marks will be given to those who do it, i.e., you can get above 100% of the project marks! though eventually the maximum mark for the project is 100% of the project marks). The list is non-exhaustive; if you have other thoughts, please check with Prof Tan first.

1.     Implement another data type besides integer and string, e.g. float, double, time

2.     Implement another query optimizer

3.     Implement non-unique attribute names, i.e., support for (relation, attribute)-pair in queries (e.g., “t.a” and “t.a1=s.a1”)

4.     Implement multi-way sort (and solve the k>2 bugs in the sort code for join)!

5.     Implement set operations (e.g., union, intersection)

6.     Implement error-checking for queries, i.e., validate that queries are correct, tables exist, attributes defined, etc.

 

Note that these operations may require you to make changes to the parser. Demonstrate with sample queries that your extensions work.

 

Experiments

For your final report for SimpleDB+, present results (running times) of the following experiments on your student database.

Experiment 1

Run a 2-table join query. Record the time to perform the join under each join algorithm.

 

Experiment 2

Run a 4-table join query.  Your team should try out three different execution plans for this query (by restricting the join methods supported by the optimizer) and execute them with your different join algorithms.

Now do a combined write-up of the results your group obtained.

In your experimental write-up, include the timings, the plans used, and a discussion of the differences (if any) between timings on different algorithms and different plans.  Keep your report brief (no more than 4 pages).

 

Submitting Your Project

Create a folder containing 3 files:

1.     The source code of your integrated SimpleDB+

2.     The experimental write-up (in PDF or Word)

3.     The list of changes that you have made to the original code base. Although you have submitted some of these in earlier labs, you should include everything here. No need to integrate them into a nice table. Just consolidate them under Lab 1, Lab 2, etc.

 

Zip the folder and submit to the project folder in LumiNUS.   

 

 

Demo

 

You are required to demonstrate your project. Please book a timeslot with Professor Tan for your demo. You will be given 20 minutes to demo your project.

 

Project Grading

Marks will be awarded for the final SimpleDB+ based on the following criteria:

  1. correctness of program
  2. efficiency of algorithms
  3. documentation within program
  4. report - writeup on the experimental results (keep your writeup brief and straight to the point)
  5. demonstration
  6. code

For each lab, 1 mark (out of the CA marks) will be deducted for late submission (within 48 hours), and 2 marks for no submission (beyond 48 hours). For the final submission, 3x marks will be deducted for late submission (within 24x hours).

NO CHEATING/PLAGIARISM! Following NUS policy, any one or team found cheating will receive an F grade for the entire module, and will be reported to the faculty/university for disciplinary action.