UIT2201: Tutorial Set 8 (Fall 2016)
(Solution Sketch to Selected Problems)
** NOT TO BE GIVEN TO FUTURE UIT2201 STUDENTS **

Comments on designing algorithmic query processing:

0. First get the equivalent SQL query. (So we know which tables are used).
1. Use e-select first wherever possible to reduce table sizes.
2. Use e-join as late as possible.
3. Use e-project at the end.



T8-D2: (Algorithmic Query Processing) --- SOLUTION SKETCH

The declarative SQL-query: (declares what, but not how.)
   SELECT  ID, PlanType
   FROM    Employees, InsurancePolicies  
   WHERE  (Birthdate > #1/01/60#
          AND (ID = EmployeeID); 

The algorithmic version (using e-project, e-select, e-join): (the how)
  S1 <== e-select from Employee where (Birthdate > #1/01/60#);
  S2 <== e-join S1 and InsurancePolicies where (ID = EmployeeID);
 Ans <== e-project ID, PlanType from S2;


T8-Q1: (Algorithmic Query Processing) --- SOLUTION SKETCH

The declarative SQL-query: (declares what, but not how.)
   SELECT  ID, LastName, FirstName, PayRate
   FROM    EMPLOYEES
   WHERE  (PayRate < 15.00); 

The algorithmic version (using e-project, e-select, e-join): (the how)
  T1 <== e-select from Employee where (PayRate < 15.00);
 Ans <== e-project ID, LastName, FirstName, PayRate from T1;



Consider a database with 3 tables, STUDENT-INFO, COURSE-INFO, and ENROLMENT. Assume
    • the STUDENT-INFO table has 30,000 (3x104) rows,
    • the COURSE-INFO table has 1,000 (103) rows,
    • the ENROLMENT table has 100,000 (105) rows.

STUDENT-INFO
Student-ID Name NRIC-ID Address Tel-No Faculty Major

...

...

...

...

...

...

...

COURSE-INFO
Course-ID Name Day Hour Venue Instructor

...

...

...

...

...

...

ENROLMENT
Student-ID Course-ID

...

...


T8-Q2: (5 points) (Algorithmic Query Processing)

The query is
List the Student-ID, SI.Name, Tel-No of all History majors;

The declarative SQL-query: (declares what, but not how.)
   SELECT  SI.Student-ID, SI.Name, SI.Tel-No
   FROM    SI
   WHERE  (Major="History"); 

The algorithmic version (using e-project, e-select, e-join): (the how)
  U1 <== e-select FROM SI WHERE (Major="History");
 Ans <== e-project SI.Student-ID, SI.Name, SI.Tel-No FROM U1;
Analysis: (Assume m = number of history majors = 200)


T8-Q3: (10 points) (Algorithmic Query Processing)
The query is
List the Student-ID, Major, Course-ID of all courses taken by History majors;

The declarative SQL-query: (declares what, but not how.)
   SELECT  SI.Student-ID, SI.Major, EN.Course-ID
   FROM    SI, EN
   WHERE  (Major="History")
      AND (SI.Student-ID = EN.Student-ID); 

(a) [The Bad Algorithm -- e-join, e-select, e-project]
  B1 <== e-join SI and EN WHERE (SI.Student-ID = EN.Student-ID);
  B2 <== e-select from B1 WHERE (Major="History");
 Ans <== e-project SI.Student-ID, SI.Major, EN.Course-ID FROM B2;
Analysis: (Assume m = number of history majors = 200; each take 5 courses)

(b) [The Good Algorithm -- e-select, e-join, e-project]
   G1  <-- e-select FROM SI WHERE (Major="History");
   G2  <-- e-join G1 and EN WHERE (G1.Student-ID = EN.Student-ID);
 Ans <== e-project SI.Student-ID, SI.Major, EN.Course-ID FROM G2;
Analysis: (Assume m = number of history majors = 200; each take 5 courses)

Observation: There is a factor of more than 150 difference between the bad and the good algorithm. (3x109/2x107 = 150). That's a difference of (1min versus 2.5hr)!!


T8-Q4: (10 points) (Multiple Joins Query)
The query is
List the Student-ID, SI.Name, Tel-No of all History majors who have lectures in "LT13".

The declarative SQL-query: (declares what, but not how.)
   SELECT  SI.Student-ID, SI.Name, SI.Tel-No
   FROM    SI, CI, EN
   WHERE  (Major="History")
      AND (Venue="LT13")
      AND (SI.Student-ID = EN.Student-ID);
      AND (CI.Course-ID  = EN.Course-ID); 

The algorithmic version; (Remember: e-selects first, e-joins later, whenever possible.)
   H1  <-- e-select FROM SI WHERE (Major="History");
   H2  <-- e-select FROM CI WHERE (Venue="LT13");
   H3  <-- e-join H2 and EN WHERE (H2.Course-ID = EN.Course-ID);
   H4  <-- e-join H3 and H1 WHERE (H3.Student-ID = H1.Student-ID);
 Ans <== e-project SI.Student-ID, SI.Name, SI.Tel-No FROM H4;
Analysis: (Assume m = number of history majors = 200; each take 5 courses)
(Assume p = courses using LT13 per week = 30, average 100 each class;)


UIT2201: CS & IT Revolution; (LeongHW, 2016) [Prints well with A4-page-offsets l=r=b=0.5", t=0.7"]