|
Last updated on:
10 November 2008 06:01:34 PM
1.
Heap - the best Priority Queue
2. Binary Search Tree
3. Dictionary/Hash Table
4. Union-Find Disjoint Set / Set
STL
5. Augmenting Data Structures
6.
More sophisticated data structure
I define "advanced data structure" as
follows: even if you do not use any of these, you
can still solve your problem, however, in some cases, using this advanced data
structure will make your solution more
efficient.
1.
Heap - the best Priority Queue
A heap is a complete binary tree that fulfill the Heap property.
Heap property:
Every node's value is less than both of its children's values (Min Heap
tree). Mathematically represented as: For all i, i != root, A[Parent(i)]
<= A[i]. For Max Heap, the heap property is just the reverse from the Min
Heap: A[Parent(i)] >= A[i].
Maintaining the heap property is called Heapify.
For this section, assume that we use Min Heap.
Figure 1. A Min Heap
Heap Representation
If the tree is filled in from left to right level by level (recall the definition
of Complete Tree), then the heap can be stored in an array, which is just
the top level from left to right, the second level, and so on.
The heap given in Figure 1 would be: 3 5 9 6 12 13 10 8 11
In this representation, the node at position i has a left children
at 2i and right children at 2i+1 (assuming 1-based indexing),
and the parent of i is truncate (i/2).
We can calculate this index very quickly using binary representation.
To calculate 2i, we can simply shift the binary representation of
i one bit left
To calculate 2i+1, we can simply shift the binary representation
of i one bit left and add 1 at the rightmost bit
To calculate truncate(i/2), we can simply shift the binary representation
of i one bit right
How to add and remove things to a Heap?
To add an element, put it at
the end of the array (leaf) and try to restore the Heap property (see above).
Now, as long as it's below its current parent, swap it with its parent.
For example, to add the number 4, the heap array would go through the following
states:
index | 1 2 3 4 5 6 7 8
9 10, key=4
step1
| 3 5 9 6
12
13 10 8 11 4
=> i=10, Parent(i)=5 => key(4)<12
step2 | 3 5 9 6
12
13 10 8 11 12
=> A[10]=A[5]
step3 | 3 5
9 6 12
13 10 8 11 12 => i=5 , Parent(i)=2 => key(4)<5
step4 | 3 5
9 6 5
13 10 8 11 12 => A[5] =A[2]
step5 | 3
5
9 6 5 13 10 8 11 12 => i=2 , Parent(i)=1 => key(
4)>3
step6 | 3 4
9 6 5 13 10 8 11 12 => A[2] =key
Pseudo code for Heap-Insert
Heap-Insert(A,key)
heap-size(A)++
i <- heap-size[A]
while (i>1 and A[Parent(i)] > key)
A[i] <- A[Parent(i)]
i <- Parent(i)
A[i] <- key
Deleting an element is also
relatively easy. Take the last element in the array and replace the element
you wish to delete with it and then restore the Heap property (see above).
While one of its children is less than it, pick the children with smaller
index and swap with it. For example, to delete 11:
index | 1 2 3 4 5 6 7 8,
to i=1 (11)
step1
|
11
5 9 6 12 13 10
8
=> exchange A[i] with A[heap-size(a)]
step2 | 8
5 9
6 12 13 10 11 =>
compare i=1 with children (8>5 || 8<9)
step3 | 5
8
9 6 12 13 10 8 => exchange A[1] with A[2]
step4 | 5 8
9 6 12
13 10 8 => compare i=2 with children (8>6 || 8<12)
step5 | 5
6
9 8
12 13 10 11 => exchange A[2] with A[4]
step6 | 5 6 9
8
12 13 10 11
=> compare i=4 with its children (8<11), stop
Pseudo code for Heap-Delete
Heap-Delete(A,i)
exchange (A[i],A[heap-size(A)])
while (i<heap-size(A) and (A[i]>A[leftChild(i)] or A[i]>A[rightChild(i)]))
if (A[i]>A[leftChild(i)])
exchange(A[i],A[leftChild(i)])
else
exchange(A[i],A[rightChild(i)])
How to alter a value?
To alter a value upwards, change the value, and swap with its parents
as long as necessary. To alter a value downwards, change the value,
and swap with the smaller of its two children as long as necessary.
Why Heap?
Heap is a very efficient Priority Queue.
Heap can answer these problems very fast (Assume for Min Heap):
-
What is the minimum key?
the root. O(1)
-
Delete the minimum key!
remove root, exchange a leaf with root, restore Heap property. O(log
N)
-
I want to insert a value?
insert at the leaf, restore Heap property. O(log N)
-
Is the heap empty?
root empty? O(1)
Heap Sort
Because Heap is quite efficient, we can use Heap to do a quick sorting in
O(N*log N) time. Simply put all elements to a heap, then repeatedly restore
the heap property. The final configuration will be a sorted heap.
Heap Sort can sort in place, this beats Merge Sort algorithm which needs
additional array. The O(N*log N) worst case running time also beats standard
sorting algorithms such as Insertion, Selection or Bubble sort.
However, even though Heap Sort worst case running time is O(N*log N), Quick
Sort is generally outperforms Heap Sort for average case, therefore I always
prefer Quick Sort than any other sorting algorithms.
You should only use Heap + Heap Sort if and only if you want to create an
efficient Priority Queues. Do not force yourself using Heap Sort only to
achieve O(N*Log N) worst case running time since Heap implementation is
much more complicated than other simple data structure.
Heap Sort Implementation
First we build a Heap using Build-Heap algorithm that runs in O(N) time,
then restore Heap Property (Heapify - O(log N)) for each nodes of Heap A[1..(N-1)].
Note: Some call Heapify as Percolate Up/Down, and Build-Heap as fixHeap,
anyway this is the implementation of Heap Sort according to Introduction
to Algorithms book and I prefer this way :-)
Heapify
Heapify(A,i)
i <- LeftChild(i)
r <- RightChild(i)
if l <= heap-size(A) and A[l] > A[i] then
largest <- l
else
largest <- i
if r <= heap-size(A) and A[r] > A[largest] then
largest <- r
if largest != i then
exchange(A[i],A[largest])
Heapify(A,largest)
Heapify sample run
Initial Heap, call Heapify(A,3)
27
17 *3
16 13 10 1
5 7 12 4 8 9 0
Call Heapify(A,6)
27
17 10
16 13 *3 1
5 7 12 4 8 9 0
Call Heapify(A,13), base case, Heapify stop, creating this Final Heap
27
17 10
16 13 9 1
5 7 12 4 8 3 0
Build-Heap
Build-Heap(A)
heap-size[A] <- length(A)
for i <- truncate(length(A)/2) downto 1
Heapify(A,i)
Build-Heap sample run
* is the current index
Initial Semi-Heap, i=4
5
3 17
*10 84 19 6
22 9
Semi-Heap 2, i=3
5
3 *17
22 84 19 6
10 9
Semi-Heap 3, i=2
5
*3 19
22 84 17 6
10 9
Semi-Heap 4, i=1
*5
84 19
22 3 17 6
10 9
Final Heap
84
22 19
10 3 17 6
5 9
HeapSort
This interesting property of Heap can be used
to build another O(n log n) sorting algorithm, Heap Sort. Details of Heap
Sort can be found in sorting section.
|
TIPS: UTILIZING C++ PRIORITY QUEUE STL
Seems it will
be hard for us to implement the heap data structure correctly
during contest time... fortunately, C++ has priority_queue
STL, which by default (if I'm not mistaken), use heaps as
the underlying data structure. So, why bother creating your
own heap data structure if what you need is a generic priority_queue.
Here is a
quick sample of using C++ priority_queue STL, refer to Internet
for a more complete reference.
#include
<stdio.h>
// this is where priority_queue implementation resides
#include <queue>
// use this to avoid specifying "std::" everywhere
using namespace std;
void main() {
// just do this, write priority_queue<the type you want,
// in this case, integer> and the PQ name
priority_queue<int> pq;
// try inserting 7 different integers, not ordered
pq.push(3); pq.push(1); pq.push(2);
pq.push(7); pq.push(6); pq.push(5);
pq.push(4);
// it will come out sorted... see that we must do
// top() and pop(), we can't combine them
while (!pq.empty()) {
printf("%d ",pq.top());
pq.pop();
}
printf("\n");
}
|
References:
-. CLRS chapter 6, Heapsort
2.
Binary Search Tree
Binary Search Tree (BST) enable you to search a collection of objects (each
with a real or integer value) quickly to determine if a given value exists
in the collection.
Basically, a binary search tree is a node-weighted, rooted binary ordered
tree. That collection of adjectives means that each node in the tree might
have no child, one left child, one right child, or both left and right child.
In addition, each node has an object associated with it, and the weight
of the node is the value of the object.
The binary search tree also has the property that each node's
left child and descendants of its left child have a value less than that
of the node, and each node's right child and its descendants have a value
greater or equal to it.

Binary Search Tree
The nodes are generally represented as a structure with four fields, a pointer
to the node's left child, a pointer to the node's right child, the weight
of the object stored at this node, and a pointer to the object itself. Sometimes,
for easier access, people add pointer to the parent too.
Why are Binary Search Tree useful?
Given a collection of n objects, a binary search tree takes only O(height)
time to find an objects, assuming that the tree is not really poor (unbalanced),
O(height) is O(log n). In addition, unlike just keeping a sorted array,
inserting and deleting objects only takes O(log n) time as well. You also
can get all the keys in a Binary Search Tree in a sorted order by traversing
it using O(n) inorder traversal.
Variations on Binary Trees
There are several variants that ensure that the trees are never poor.
Splay trees, Red-black trees, B-trees, and AVL trees are some of the more
common examples. They are all much more complicated to code, and random
trees are generally good, so it's generally not worth it.
Tips: If you're concerned that the
tree you created might be bad (it's being created by inserting elements
from an input file, for example), then randomly order the elements before
insertion.
|
TIPS: List of some useful C++ STL
Some STL that
may be useful for programming contest: vector, deque, list,
slist, bit_vector, set, map, multiset, multimap, stack,
queue, priority_queue, bitset.
You may want
to refer to
this
website for complete detail.
|
References:
-. CLRS chapter 12, Binary Search Tree
-. CLRS chapter 13, Red Black Tree
3. Dictionary
/ Hash Table
A dictionary, or hash table, stores data with a very quick way to do lookups.
Let's say there is a collection of objects and a data structure must quickly
answer the question: 'Is this object in the data structure?' (e.g., is this
word in the dictionary?). A hash table does this in less time than it takes
to do binary search.
The idea is this: find a function that maps the elements of the collection
to an integer between 1 and x (where x, in this explanation, is larger than
the number of elements in your collection). Keep an array indexed from 1
to x, and store each element at the position that the function evaluates
the element as. Then, to determine if something is in your collection, just
plug it into the function and see whether or not that position is empty.
If it is not check the element there to see if it is the same as the something
you're holding,
For example,
presume the function is defined over 3-character words, and is (first letter
+ (second letter * 3) + (third letter * 7)) mod 11 (A=1, B=2, etc.), and
the words are 'CAT', 'CAR', and 'COB'. When using ASCII, this function takes
'CAT' and maps it to 3, maps 'CAR' to 0, and maps 'COB' to 7, so the hash
table would look like this:
0: CAR
1
2
3: CAT
4
5
6
7: COB
8
9
10
Now, to see if 'BAT' is in there, plug it into the hash function to get
2. This position in the hash table is empty, so it is not in the collection.
'ACT', on the other hand, returns the value 7, so the program must check
to see if that entry, 'COB', is the same as 'ACT' (no, so 'ACT' is not in
the dictionary either). In the other hand, if the search input is 'CAR',
'CAT', 'COB', the dictionary
will return true.
Why are Hash Tables useful?
Hash tables enable, with a little bit of memory cost, programs to perform
lookups in almost constant work. Generally, the program must evaluate the
function and then possibly compare the looked up element to an entry in
the table.
Collision Handling
This glossed over a slight problem that arises. What can be done if two
entries map to the same value (e.g., I wanted to add 'ACT' and 'COB')? This
is called a collision. There are couple ways to correct collisions, but
this document will focus on one method, called chaining.
Instead of
having one entry at each position, maintain a linked list of entries with
the same hash value. Thus, whenever an element is added, find its position
and add it to the beginning (or tail) of that list. Thus, to have both 'ACT'
and 'COB' in the table, it would look something like this:
0: CAR
1
2
3: CAT
4
5
6
7: COB -> ACT
8
9
10
Now, to check an entry, all elements in the linked list must be examined
to find out the element is not in the collection. This, of course, decreases
the efficiency of using the hash table, but it's often quite handy.
Hashers Training
There are two basic things you need to consider to avoid collisions. The
first is easy: have a large hash table. Assuming a reasonable hash function,
this reduces collisions just for probabilistic reasons.
The more subtle, and often forgotten, thing to do to avoid collisions is
pick a good hash function. For example, taking the three letter prefix as
the hash value for a dictionary would be very bad. Under this hash function,
the prefix 'CON' would have a huge number of entries. Pick a function where
two elements are unlikely to map to the same value:
-
Create a relatively huge value and mod it with the size of your table
(this works especially well if your hash table is a prime size).
-
Primes are your friends. Multiply by them.
-
Try to have small changes map to completely different locations.
-
You don't want to have two small changes cancel each other out in your
mapping function (a transposition, for example).
-
This is a whole field of study, and you could create a 'Perfect Hash
Function' that would give you no collisions, but, for your purposes,
that's entirely too much work. Pick something that seems fairly random,
and see if it works; it probably will.
Hash Table variations
It is often quite useful to store more information that just the value.
One example is when searching a small subset of a large subset, and using
the hash table to store locations visited, you may want the value for searching
a location in the hash table with it.
Even a small hash table can improve runtime by drastically reducing your
search space. For example, keeping a dictionary hashed by the first letter
means that if you wanted to search for a word, you would only be looking
at words that have the same first letter.
Tips: Use C++ STL map for
dictionary :D
References:
-. CLRS chapter 11, Hashing
4.
Union-Find Disjoint Set
Sometimes we want to represent
a set. That is... something like this: item a,b,c belong to set 1, item
d,e belong to set 2... You are asked the following questions:
is a and b belong to the same
set? Yes
is a and d belong to the same set? No
I want to union set 1 and set 2, can you do it quickly?...
Whenever you need to deal
with sets... remember the term: Union-Find. This is a very sophisticated
data structure for dealing with set. I won't elaborate much in this page,
but I suggest you to read CLR chapter 22 for a very complete detail. Here,
I will just give you a working C/C++ implementation for Union-Find data
structure. This version use integer as set identifier.
Use make_set(x) to create
the initial set with id x, use find_set(x) to find which set item x located,
and use union_set(x,y) to merge two sets containing item x and y.
#define MAX 1000 // adjust
this properly (the max size of your set)
/* UNION-FIND library */
int p[MAX],rank[MAX];
// make a new set ID x
void make_set(int x) {
p[x] = x;
rank[x] = 0;
}
// don't call link directly, rather, use union_set
void link(int x,int y) {
if (rank[x] > rank[y])
p[y] = x;
else {
p[x] = y;
if (rank[x] == rank[y])
rank[y] = rank[y] + 1;
}
}
// find the set ID of item x
int find_set(int x) {
if (x != p[x])
p[x] = find_set(p[x]);
return p[x];
}
// union two set containing item x and item y
// see, this one calls find_set first!
void union_set(int x,int y) {
link(find_set(x),find_set(y));
}
Tips: Use C++ STL multimap
for set :D
References:
-. CLRS chapter 21, Data Structures for Disjoint Set
5.
Augmenting Data Structures
Sometimes, the operations
need to be performed more effectively. Some of them can be supported by
tweaking the textbook data structure to support the required operations.
I don't have much knowledge yet, please see the reference for more details.
References:
-. CLRS chapter 14, Augmenting Data Structures
6.
More sophisticated data structure
They are, but not limited
to: Red Black Tree, Splay Tree, B-Tree, Binomial Heap, Fibonacci Heap, etc
These will be written
later when I have time :$
References:
-. CLRS chapter 13, Red Black Tree
-. CLRS chapter 18, B-Trees
-. CLRS chapter 19, Binomial Heaps
-. CLRS chapter 20, Fibonacci Heaps
What's Next?
With these advanced data structures,
we can solve several difficult problems efficiently. :).
You almost have all the required knowledge to be a competitive programmers.
Next topic is on several miscellaneous
stuffs that you may be interested. Stay tuned :D
This document, 11_advanced_data_structures.html, has been accessed 14268 times since 26-Jul-04 20:31:59 SGT.
This is the 7th time it has been accessed today.
A total of 9418 different hosts have accessed this document in the
last 1591 days; your host, 38.103.63.56, has accessed it 1 times.
If you're interested, complete statistics for
this document are also available, including breakdowns by top-level
domain, host name, and date.
|