CS1102 2006/2007 Semester 2

Back to previous level


Helpsheet

My helpsheet which I did for CS1102. Remember that I was just a blur student when I took the course. Download it here and here (2 sides) if you want a sample.

Questions regarding previous topics

You can contribute to this section by asking me questions. I will then post the solution up here within a day (usually within an hour, unless its in the wee hours of the night). Check this page often, it WILL be helpful (assuming everyone contributes questions :))

Ask me a question by using the form above! You do not have to leave your name, however if you do leave your email address I would be able to email you the answer. If you do not, you can simply search for your question and answer in this page. This page will be updated hourly.

By the way, you can just address me as max.

Generic

Q) Regarding ADT. On page 21 of Lecture2, there's a section entitled Advantages in using interface as data type of parameters. Just what does this actually mean? Does it mean that interface is used as declaration of a data type? E.g (interface) Comparable integ;
A) Comparable is an interface declared in Java.

When a class (e.g. Person) implements the Comparable interface, then when you declare the class, you can declare it as either

Person a = new Person();
Or
Comparable a = new Person(); //declare a as a comparable type instead of a person type.

One of the advantages is that methods which can take in the comparable parameter, can now take in the person object ( for example sorting). E.g. Arrays.Sort(Comparable c)

Sorting

Q) In the calculation of time complexity of mergesort, we consider the level of depth to which the recursion goes and not the actual number of recursive calls.thats how we get the log n factor. But aren't we suppose to consider the actual number of recursive calls?
A) There are several reasons why we dont count the number of recursive calls. Firstly, because in each call, the number of operations required differ, for example the first call in mergesort to split and merge requires an O(n) complexity, while the next call in mergesort to split and merge requries an O(n/2) complexity and so on. Since the number of calls each time differs, we wont be able to do multiplication to obtain the total complexity. Secondly, the number of recursive calls would be 2n - 1, you can sum it up easily, (last level has n nodes, 2nd last has n/2 nodes...convergent series), how would the number of recursive calls help you? A different strategy is required, and based on the observation that regardless of the split, each level requires n comparisons and hence we can obtain nlogn complexity.

Q) Why is mergesort not in-place?
A) In place sorting algorithm means you do not need additional storage space (in the order of n), an anology would be sorting some books on your desk. If you can sort your books on your desk without moving any books to another desk, then it is inplace, you can think of it as sorting in its original place.

In the merge step of merge sort, the 2 sorted arrays (they are actually 2 seperate sub-portions of a single array) to be merged will be merged onto a new array. To look at it as a book-table example, to perform merge sort, you have 2 sorted sub-portions on desk A. You then merge this 2 subportions on desk B, it is not possible to perform the merging step on desk A because there is no more space.

On the contrary for quick sort, you pick a book (pivot) on the desk, then from left to right, you just swap books from left to right such that all books on the left of the pivot are smaller then it. Hence quick sort is a in-place algorithm.

Q) One question about quick sort. I know that in the worst case, it will take o(n^2). But do we have a solution to avoid that happen?
A) Conventionally the worst case occurs when the array is sorted and the pivot chosen is always the first element. The short answer to this question is to choose a random pivot instead of the first element. However, you may argue that in the situation that the program is "unlucky" and randomly chose the pivot such that the tree is perfectly imbalance. In theory this is highly unlikely, and in fact detailed analysis of quicksort have determined that in any random array of n numbers, the expected height of the tree is 2logn.

Optimizations of quicksort including choose the Best of Three pivots for small n (or for small base cases), and Best of Best of Three (choose 3 sets of best of 3, then choose the best of 3 from there) for large n. Other optimizations including using insertion sort for small base cases ( n < 8) to reduce the number of small recursive calls (There are n recusive calls in the last level!). The overhead of a method call lies in the fact that the OS has to push a stack frame into the stack for each recursive call.

Stacks and Queues

Q) Can I implement a stack with Push(), Pop() and findmin() in O(1) time complexity each?
A) Yes, the idea is relatively simple. In a stack you can only remove the top element. So if you want to push an item on the stack, instead you push a tuple consisting of the following (element, minElement). Each "layer" of the stack simply keeps track of the minimum item it has seen so far (up to itself), and when you push a tuple into the stack, you simply peek the top element, check if current element is smaller then "minElement". If current element is smaller, then set its minElement to element. Otherwise set its minElement to the minElement specified on the topmost element. Then push it on the stack. To findmin() you simply peek the stack and look at the minElement field of the topmost tuple.

Q) How do I recursively find the minimum element in the stack?
A)
int findMin(Stack s)
{
   if(s.isEmpty())
      return infinity; //return some very large number
   int i = s.pop();
   int j = findMin(Stack s);
   if( i < j) return i;
   else return j;
}

Q) Lab 10 ex 2 uses Queue to sort... Is it possible to use Stack?
A) The idea behind this sorting in a queue is pretty simple. It just  takes the maximum value and recursive call in the sort step to sort the remaining items.

There is nothing interesting about the sorting method, there are n recursive calls and each call takes O(n) time.

You can do the exact same thing using a stack as well however you will  need two recursive calls. A recursive removeMax method and a recursive Sort method.

Q) How do you implement the removeMax method stated above?
A) If you try to do everything together (find max and remove it), then it will be difficult. However if you break it into subproblems it becomes easy.

int removeMax(Stack s)
{
   int i = getMax(Stack s);
   removeElement(s, i);
}

void getMax(Stack s)
{
   if(s.isEmpty()) return 0;
   int i = s.pop();
   int j = getMax(s);
   if(i > j) return i;
   else return j;
}

void removeElement(Stack s, int i)
{
   if(s.peek() == i)
   {
      s.pop(); return;
   }
   int j = s.pop();
   removeElement(s, i);
   s.push(j);
   return;
}

Priority Queues and Heap

Q) page 37 why the no of edges visited is n-1-(h-1)?
A) Suppose in the worst case of heapify, each node will bubble down all the way. Pictorially, we can visualize each bubble down to go down all the way in the worst case (it doesnt matter which way it chose, we just want to see how many edges is travelled), then let each bubble down operation be in a LEFT, RIGHT.. RIGHT direction. As you can see from the colored edges, all edge has been travelled EXCEPT the rightmost edge. Total number of edges is N -1, right edge is H - 1. Hence complexity is O(N - 1 - (H - 1))

Q) How do I implement a queue and a stack using a heap/priority queue?
A) Simple, for a queue, you want to implement FIFO. So just insert the elements into the priority queue with the same priority. The P-Queue will have the FIFO behaviour. To implement a stack using a priority queue, you want to implement LIFO. So for each element you insert into the p-queue, you just insert with an increasing priority (for max heap - greatest value is root).

Q) Which of the following is true of a priority queue?
a) If elements are inserted in increasing order of priority (i.e, lower priority element inserted first), and all elements are inserted before any are removed, it works like a queue.
b) If elements are inserted in increasing order of priority (i.e, lowest priority element inserted first), and all elements are inserted before any are removed, it works like a stack.
c) If all elements are inserted before any are removed, it works like a queue.
d) If elements are inserted in decreasing order of priority (i.e, highest priority element inserted first), and all elements are inserted before any are removed, it works like a stack.
e) If all elements are inserted in increasing order of priority, then it works like a queue whether or not all insertions precede any removals.
A) Just remember how a priority queue works. The priority queue is a blackbox (it doesnt have to be a heap) such that regardless of what you put into the priority queue, it will always dequeue elements with the highest priority first.

With that in mind, lets look at the question one by one.

For a and b, if we insert in increasing item of priority, e.g. elements 1,2 ,3 ,4 ,5, and then dequeue, the items that will be dequeued will be 5, since the highest priority item is 5, followed by 4,3,2,1. So in fact it behaves like a stack.

For c, This is clearly false, for example, if i insert elements with 1,2,3 into the priority queue, and if it behaves like a queue, then it should output 1,2,3. But since its a priority queue, it will output 3,2,1.

For d, If elements are inserted in decreasing order or priority, it would behave like a queue. Since the first item you insert has the highest priority, it will be output first.

For e, regardless of how many insertions precede removals, if i insert in increasing order or priority, it will in no way behave like a queue, but behave like a stack instead.

Q) Refering to the Heapsort method provided in lecture slide 40 and 41. does it mean that we can randomly assign value to the array and then draw the binary tree out. after that swap if it doesnt fulfill the heap property?
A) Essentially how heapsort works is that firstly, the heap is constructed from the array of values using the heap construction algorithm.

After the heap is constructed, the maximum value is removed from the heap and heapify is called. This reduces the size of the array by 1. Because the size is reduced, we can simply put the maximum value into that slot (which was removed) from the heap.

Just keep repeating this step until the all values have been removed.

Q) Given this expr -(a+b)^n -c/d*e, if we have to put this in an expression tree, how do we implement the unary minus?
A) Treat the expression as 0 - (expression).

Hashing

Q) Hashing slide 76 --> Why the ordered traversal of all the items for hashing is O( n log n)? Didn't get wat is it talking abt at all.
A) Ordered traversal means sorted order In a BST, you can use an in-order traversal to achieve O(n) complexity. In hashing, you must first retrieve all elements and copy them into an array, then sort it. Hence it is an O(n log n) complexity (for sorting).

Q) In slide 42, does the complexity applies only to separate chaining or to all in general?
A) The complexity applies to all conflict resolution schemes you have chosen.

Q) When counting the number of probes in collision resolution via linear/quadratic, do we count the 0th probe?
A) Yes.

Q) |39|X|  |  |30|5 |  |  |34|  |62|  |  | (Array of 13 slots, with 5 elements and 1 deleted)
How many more elements can we put into the table before we might not be able to find an empty slot for the next element?

My answer is that we can insert 2 more elements. Because after inserting 1 more element the load factor is still less than 0.5 ( load factor = 6/13 if it is 5/13 before inserting elements). So we can still find an empty slot for the next element.  The factor load will be larger than 0.5 after inserting the second element. In other words, we might not find an empty slot for the third one.

Actually, I tried to insert the third one and there is always a place for it!
A) Take note that in the lectures notes it states that if the load factor is alpha < 0.5 and m is prime then you can always find a empty slot.

This condition (load factor and m is prime) is sufficient but not necessary for you to find an empty slot. Meaning, you can always find an empty slot IF and ONLY IF alpha < 0.5 and m is prime, BUT you MAY still find an empty slot if alpha is > 0.5 -- no gurantees.

Q) In this hash function h(key)=(key*random())%100 where 100 is size of hash table, what value does the random function generate? If we jump 5 slots on collision each time, how do we determine if this function is good for inserting 95 keys in the range of 0-9999 into the table
A) Consider a random function. The random will generate a real number BETWEEN 0 - 1 everytime. This means that when you try to insert a key "5", the hash function may return index 6. And when u try to retrieve it, it may return index 75. The random function returns a random number evertime. Therefore this method will not work as a hash function!

Trees

Q) If the tree only contains one node, is it consider a tree ? If yes, then , is it called empty? it's height is zero or one? If no, then, how to define a empty tree?
A) A tree with only one node is a tree. It is not empty, it has a height of 1. An empty tree is a tree without any nodes and it would have a height of 0. A tree without any nodes is also a tree.

Q) For a tree that contains one node only (whatever its name is), is it considered a full binary tree? and is it considered completed?
A) A tree with 1 or no nodes is a full tree. A full tree is therefore complete.

Q) Is an empty tree balanced/full/complete? How about a tree with 1 node.
A) An empty tree is balanced full and complete.  (left subtree = 0, right subtree = 0). So is a tree with 1 node.

Q) Which traversal method we should use to store the binary tree, binary search tree, AVL tree in order to get its original shape later?
A) For a binary tree, you need both inorder and preorder/postorder traversals. For an BST you only need the preorder/postorder traversals. Inorder traversal in a BST gives you a sorted sequence and will not be able to restore the structure of the tree. An AVL tree is an BST, so there is no difference. Level order traversal will work for all cases except a binary tree.

Typically, for n-ary trees (Trees with n children), I find it easier to store it as an adjacency list.

Q) How do we insert duplicate values in a binary search tree? Should we put a duplicate value in the right or left subtree of its duplicate?
A) To keep it simple, in your simplified view of a tree where the key of the node and the values are the same, you can simply maintain a counter for each node to count the number of times an item is repeated. Alternatively you can also store it in the left or right subtree as you wish.

This question has also been answered by Melvin Zhang in the forums, his reply was

"Note that usually we only consider the keys when talking about a BST, but it should actually be a (key, value) pair where "value" is the data that is stored and key is the index with which to access this data, i.e. say for a BST of students the key is the matric no and the value is the Student object.

So if there are multiple pairs of (key, value) with the same key, a more generic solution for approach 1 is to store a list of values.
"

That is correct. When you eventually will make use of a BST, then your node would probobly consist of a key (which is used to construct the tree) and the value (infomation stored in the node). An anology would be creating a BST based on the matric number of students, and storing their other info in the same node.

Q) From notes lecture 7: Trees, slide 23.
Binary tree = “each node has at most 2 ordered children”
What is the meaning of “order” ?
Is it about traversal?
Why must it be ordered (coz if it really is about traversal, the traversal methods will take care of the order rite) ?
A) The order simply means that the children nodes being the left and right child, cannot be interchanged with each other. In short, their order ties to their being the left or the right child.

Just imagine a tree.. if the structure of the tree is
 1
2 3

then
 1
3 2
is not the same tree.

Q)  1. Given a node that is stored in index position i, how to derive a formula to get its parent's index?
2. Also after reading lecture notes, my understand of full/complete trees got distorted! Went to wiki it, just checking with you if these definitions are accurate:
A) 1. Parent index is given by the formula... parent = (i - 1) / 2 (round down);
2. Wiki and your lecture notes have a different definition of complete, so dont read the wiki. A full tree must be full, that means if it has a height h, you cannot add any more nodes without increasing its height. A complete tree is filled from left to right, a heap is complete. A full tree is also complete.

AVL Trees

Q) Compared with an BST, is insertion, deletion, retrieval operation slower then that in an AVL tree? I think AVL tree is slower because it has to do rotations.
A) An AVL tree is faster even though it has to perform rotations. A rotation operation is O(1). When inserting, you need at most 2 rotations. When deleting you need at most log n rotations. In a BST, insertion, deletion and retrieval is O(h) complexity, but note that the tree may be perfectly imbalanced and h = n. So, an AVL tree is faster.

Q) Comparing hash table and AVL trees. Insertion and deletion in AVL trees is O(log n), while in a hash table insertion and deletion is O(1). Can I say that the hash table is better?
A) No, comparing hashtables and AVL trees are like comparing apples and oranges. AVL tree is comparison based, meaning the location of its vertices are dependant on the location of other vertices (left subtree < root < right subtree). Hash table is not comparison based, the location of its elements are not dependant on the location of other elements. For example, in an AVL tree, you can easily get the sorted sequence by using the in-order traversal whereas in the hashtable, you cannot do so -- for a hashtable you got to copy the elements into an array then sort to produce the sorted sequence.

Q) You mentioned that AVL tree is faster than a BST even though it has to perform rotations. A rotation operation is O(1) while in a BST, insertion, deletion and retrieval is O(h) complexity. But since a AVL tree is a BST with AVL property, shouldnt its complexity be the complexity of BST insertion + rotation? which makes the AVL insertion complexity = O(h + 2), where 2 is the max number of rotation, h is the insertion process.
A) Yes, an AVL tree is a BST with special properties. The property of an AVL tree ensures that the height of the tree will be log n. Whereas in a BST the height is not ensured. For example, if i insert into a BST in its sorted order, the height of the BST will be n! Whereas in an AVL tree, the height will be log n. Hence AVL tree performs better,

Q) Can you explain Lecture 8 AVL Trees slide 45 on find kth in unsorted arrays?
A) The idea behind this is similar to finding the kth element in a BST EXCEPT that we build the conceptual BST on the fly as we conduct the search. You can also think of it as a partial quicksort. First we choose a random element as the pivot. The elements on the left of this pivot are smaller then the pivot, and the elements on the right are larger. The pivot is in its final sorted position, therefore if there are i elements on the left, then the pivot must be the i+1th ranked item. Lets call the left subarray as LARRAY and the right subarray as RARRAY.

Now lets consider the kth element. If k is  < i+1 then the item we want must be in the LARRAY. If we treat the LARRAY as a single standalone array, then the kth element in this LARRAY (of index 0..i) must also be the kth element in the unsorted array.

If k is > i + 1 then the item we want must be in the RARRAY. If we treat the RARRAY as a single standalone array, then the kth element in this RARRAY (of index 0..n-i) must be the k+i element of the unsorted array (since the first element RARRAY must be at least rank i!) Therefore when we recursively search the RARRAY, we must search for k-i element.

Graphs

Q)Graphs slide 106 & 110 --> Why is the complexity O( V^2 + E) and O( (V+E) log V)? Where does the another V come from?
A)

color all vertices yellow
foreach vertex w
    distance(w) = INFINITY

distance(s) = 0
while there are yellow vertices
     v = yellow vertex with min distance(v)
     color v red
    foreach yellow neighbourw of v
         relax(v,w)

1. The outermost while loop will iterate |V| times (each iteration you color a yellow vertex red).
2. The next line, to select a yellow vertex with min distance to v, to find the "closest" vertex, you can linearlly scan through all the vertices to determine which is the vertex you want. The linear scan is |V| complexity.
3. The next foreach neightbour loop will iterate for every edge of the group, hence in the entire program, it will iterate |E| times.
Hence total complexity is (.V^2 + E). However to reduce the time in step 2, we can make use of a data structure that allows us to get the "closest" vertex quickly. A priority queue can achieve this in log v time. Slide 110 explains in detail. Hence time complexity is reduced to P(V + E log V)

Q) What is a discounted graph?
A) Depending on the discount, it means how cheap you can purchase the graph for. For example, a 50% discount means you can buy the graph at half the price - a good deal! Actually its just a typo, it should be disconnected graph instead :).

Q) Is there a difference between 0 and the infinity sign while labelling an adjacency matrix?
A) Yes. Zero means the edge has no edge weight, infinity means there is no edge.

Sometimes, sources may indicate the absence of an edge with a zero. That will only apply when the edges are stated to be explicitly positive integers.

Otherwise, you should always clarify.