Comments on designing algorithmic query processing:
0. First get the equivalent SQL query. (So we know which tables are used).
|
The declarative SQL-query: (declares what, but not how.)
SELECT ID, PlanType FROM Employees, InsurancePolicies WHERE (Birthdate > #1/01/60# AND (ID = EmployeeID); |
The algorithmic version (using e-project, e-select, e-join): (the how)
S1 <== e-select from Employee where (Birthdate > #1/01/60#); S2 <== e-join S1 and InsurancePolicies where (ID = EmployeeID); Ans <== e-project ID, PlanType from S2; |
The declarative SQL-query: (declares what, but not how.)
SELECT ID, LastName, FirstName, PayRate FROM EMPLOYEES WHERE (PayRate < 15.00); |
The algorithmic version (using e-project, e-select, e-join): (the how)
T1 <== e-select from Employee where (PayRate < 15.00); Ans <== e-project ID, LastName, FirstName, PayRate from T1; |
Consider a database with 3 tables,
STUDENT-INFO,
COURSE-INFO, and
ENROLMENT.
Assume
•
the STUDENT-INFO table has 30,000 (3x104) rows,
•
the COURSE-INFO table has 1,000 (103) rows,
•
the ENROLMENT table has 100,000 (105) rows.
Student-ID | Name | NRIC-ID | Address | Tel-No | Faculty | Major |
... |
... |
... |
... |
... |
... |
... |
|
|
T8-Q2: (5 points) (Algorithmic Query Processing)
The query is
List the Student-ID, SI.Name, Tel-No of all History majors; |
The declarative SQL-query: (declares what, but not how.)
SELECT SI.Student-ID, SI.Name, SI.Tel-No FROM SI WHERE (Major="History"); |
The algorithmic version (using e-project, e-select, e-join): (the how)
U1 <== e-select FROM SI WHERE (Major="History"); Ans <== e-project SI.Student-ID, SI.Name, SI.Tel-No FROM U1; |
T8-Q3: (10 points) (Algorithmic Query Processing)
The query is
List the Student-ID, Major, Course-ID of all courses taken by History majors; |
The declarative SQL-query: (declares what, but not how.)
SELECT SI.Student-ID, SI.Major, EN.Course-ID FROM SI, EN WHERE (Major="History") AND (SI.Student-ID = EN.Student-ID); |
(a) [The Bad Algorithm -- e-join, e-select, e-project]
B1 <== e-join SI and EN WHERE (SI.Student-ID = EN.Student-ID); B2 <== e-select from B1 WHERE (Major="History"); Ans <== e-project SI.Student-ID, SI.Major, EN.Course-ID FROM B2; |
(b) [The Good Algorithm -- e-select, e-join, e-project]
G1 <-- e-select FROM SI WHERE (Major="History"); G2 <-- e-join G1 and EN WHERE (G1.Student-ID = EN.Student-ID); Ans <== e-project SI.Student-ID, SI.Major, EN.Course-ID FROM G2; |
Observation: There is a factor of more than 150 difference between the bad and the good algorithm. (3x109/2x107 = 150). That's a difference of (1min versus 2.5hr)!! |
T8-Q4: (10 points) (Multiple Joins Query)
The query is
List the Student-ID, SI.Name, Tel-No of all History majors who have lectures in "LT13". |
The declarative SQL-query: (declares what, but not how.)
SELECT SI.Student-ID, SI.Name, SI.Tel-No FROM SI, CI, EN WHERE (Major="History") AND (Venue="LT13") AND (SI.Student-ID = EN.Student-ID); AND (CI.Course-ID = EN.Course-ID); |
The algorithmic version; (Remember: e-selects first, e-joins later, whenever possible.)
H1 <-- e-select FROM SI WHERE (Major="History"); H2 <-- e-select FROM CI WHERE (Venue="LT13"); H3 <-- e-join H2 and EN WHERE (H2.Course-ID = EN.Course-ID); H4 <-- e-join H3 and H1 WHERE (H3.Student-ID = H1.Student-ID); Ans <== e-project SI.Student-ID, SI.Name, SI.Tel-No FROM H4; |