~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ UT2201: Notes on Primitive Database Operations by H. W. Leong (Mar 2005) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SQL the Query Language: ======================= The textbook [SG] covers SQL, the query language used for doing query in most modern relational database systems (includinng Oracle, MySQL, MS-Access, etc). The syntax of a typical SQL query is SELECT FROM WHERE (Note: the conditions also specifies how to join multiple tabless.) This SQL query language is simple to use since the user only have to specify what is required (some fields), where they can be found (some tables), and what conditions they must satisfy, include how tables should be joined together to produce the results. SQL is a Declarative Language: ============================== SQL is an example of a declarative language, where we only specify (or declare) *what* we want to have as output. We do not need to specify the exact procedure necessary to find the answer (in other words, the algorithm). (Declarative languages are also used in many other areas -- we will cover one such example in the Chapter on AI.) In contrast, when we design and specify an algorithm in pseudo-code or in programming languages like Java, we specify *how* to compute the answer. Thus, these types of languages are called procedural languages (or imperative languages). Of course, given the SQL query in declarative form, we still need a step-by-step instruction on how to execute the query to get the answer -- namely, we will need a procedure/algorithm for performing the query. To do this, we first need to define the primitive database operations. Primitive Database Operations: ============================== To execute database queries, we make use of three primitive database operations, which we will call e-project, e-select, and e-join. (Aside: Earlier, in my tutorial questions and notes, I had previously used the names project, select and join. Although I stress multiple times that this primitive select is different from the SELECT in SQL query language, I kept getting questions on it. Especially from those who missed the lectures/tutorials that discussed the issues. Therefore, I have decided to call them new and different names so that we all clearly know that they are different! I will also be modifying the tutorial questions to use e-project, e-select, and e-join.) e-project: ---------- The first operation, e-project allows you to choose (project) a set of columns (fields) from a table. For example P1 <-- e-project F1, F2 from Table3; will produce P1, a new table obtained from Table3 consisting only of the fields F1 and F2. It has all the rows in Table3, but only two columns. That's it. e-project does *not* do choose or "drop" rows. e-select: --------- The operation e-select (different from SELECT in SQL) allows you to choose a set of rows (records) from a table that satisfy some specified conditions. For example S1 <-- e-select from Table4 where (Name='UIT2201'); will produce S1, a new table obtained from Table4 and consisting only of the rows (records) that satisfies the condition (Name='UIT2201'). It has all the columns in Table4, but only the chosen rows (records). That's it. e-select does *not* choose or "drop" columns. e-join: --------- The operation e-join (different from JOIN in SQL) allows you to perform a join operation on *two* table based on the specified condition for joining the tables. For example R1 <-- e-join T1 and T2 where (T1.Id = T2.EmployeeID); will produce R1, a new table with all the columns of T2 and T3. The rows in R1 are obtained by combining records in T2 and T3 that matches the specified conditions. (See lecture notes and the animated lecture notes for examples of the e-join operation.) Again, e-join does *not* choose or "drop" columns. Measuring the Complexity of Primitive Database Operations: ========================================================== It is customary to measure the complexity of primitive database operations in terms of the number of records accessed/compared. While some records are small (with only a few fields) and some are large (with many fields), we measure each operation on a record as one elementary operation. On a table with n records (rows), an e-project will take O(n) operations (since all the records have to be accessed). An e-select operation will also take O(n) operations since all records will be accessed and have the condition checked. However, an e-join operation of a table T1 with m rows and a table T2 with n rows will take O(mn) operations. Recall from the animated example in the lecture notes that each row of T1 will have to be checked with each row of T2 and so there are mn operations altogether. e-join is an expensive operation! Converting SQL Query into Executable Procedures: ================================================ Since SQL queries are declarative, they are not executable. We first need to convert them into executable procedures that uses the three primitive database operations, e-project, e-select, and e-join. However, a given SQL query can be converted into different sequences of database operations that achieve the same result. Of course, we want those that are more efficient (uses fewer operations). The tutorial discussion ran through some examples of this kind of analyses and comparisons. On lesson learned from the examples covered in the tutorials is that since e-join is an expensive operation, we should do it with care. Usually, before we use e-join, it is good to trim the tables down, wherever possible. This is usually the key idea in making SQL queries more efficient. In most modern database management systems, the SQL compilers will take the given SQL query and automatically attempt to convert it into efficent sequences of database operations. The user do not have to worry about it. This scheme uses the principle of division-of-labour -- the user specify *what* they want (in SQL query language), and leave it to the "intelligent" SQL compiler to figure out *how* to compute it in the most efficient manner. Optional Materials (only for those who are interested): ======================================================= I have added in some optional materials in case some of you are interested and want to do some further readings on this topic. The Theoretical Foundations of Relational Databases: ---------------------------------------------------- The theoretical foundations Why Measure Efficiency based on Row Operations: =============================================== ------------------------------------------------------------------------ LeongHW, (for UIT2201), Mar 2005 (modified Feb 2008)