|
Last updated on:
10 November 2008 06:01:32 PM
1.
Introduction_to_Data_Structures
2. Array
3. Linked List
4. Stack
5. Queue
6.
Overview to advanced data structures: tree, graph, heap/priority queue,
hash table, trie.
1.
Introduction to Data
Structures
N/A
2.
Array
Array is the most useful data structures, in fact, this data structure will
almost always used in all contest problems...
Lets look at the good
side first: if index is known, searching an element in an array is very
fast, O(1), this good for looping/iteration. Array can also be used to implement
other sophisticated data structures such as stacks, queues, hash tables.
However, being the easiest data structure doesn't mean that array is efficient.
In some cases array can be very inefficient. Moreover, in standard array,
the size is fixed. If you don't know the input size beforehand, it may be
wiser to use vector (a resizable array). Array also suffer a very slow insertion
in ordered array, another slow searching in unordered array, and unavoidable
slow deletion because you have to shift all elements.
Example:
int scores[10000]; // integer array of 10000 elements...
// delete item at index 20
for (int i=20; i<10000; i++) // shift
one to the left
scores[i] = scores[i+1];
scores[9999] = 0; // delete the last
one
// insert an item X at index 20
for (int i=21; i<10000; i++) // shift
one to the right
scores[i] = scores[i-1];
scores[20] = X; // insert the new item
here
|
|
No duplicate allowed
|
Duplicate items
allowed
|
|
Search
|
O(n/2) comparisons
|
O(n) comparisons
|
|
Insertion
|
No comparison,
O(1)
|
No comparisons,
O(1) move
|
|
Deletion
|
O(n/2) comparisons
O(n/2) moves
|
O(n) comparisons
more than O(n/2) moves
|
We commonly use one-dimensional array for standard use and two-dimensional
array to represent matrices, board, or anything that two-dimensional. Three-dimensional
is rarely used to model 3D situations, but higher dimension is very hard
to visualize and beyond the capabilities of human brain (unless you are
very very genius).
For 2D array, there are two types of accessing two-dimensional array, please
keep this in mind, this will speed up your code considerably.
Row major, cache-friendly, use this method to access all items in
array sequentially.
for (i=0; i<numRow; i++) // ROW FIRST = EFFECTIVE
for (j=0; j<numCol; j++)
array[i][j] = 0;
Column major, NOT cache-friendly, DO NOT use this method or
you'll get very poor performance. Why? you will learn this in Computer Architecture
course. For now, just take note that computer access array data in row major.
for (j=0; j<numCol; j++) // COLUMN FIRST = INEFFECTIVE
for (i=0; i<numRow; i++)
array[i][j] = 0;
Vector Trick - A resizable array
Static array has a drawback when the size of input is not known beforehand.
If that is the case, you may want to consider vector, a resizable array.
This is how you can create a virtually resizable array - an array that grows
and shrinks as the program executes, by using an allocate and copy strategy
with fixed-size array. If the array that you are currently using in the
program reaches its capacity, you allocate a larger array and copy the references
stored in the original array to the larger array.
Example (JAVA):
// we resize the capacity, in this example, we doubles it.
capacity *= 2;
// reallocate new array
int [] newArray = new int[capacity];
// copy everything to the new array
for (int i=0; i<myArray.length; i++) newArray[i] = myArray[i];
// Change the reference from the original array to the new array
myArray = newArray;
This resizing thing is slow, I mean very slow... Therefore I don't recommend
you to use vector for a time critical programs. There are other data structures
which offer much more efficient resizing capability such as Linked List,
etc.
|
TIPS: UTILIZING C++ VECTOR STL
Implementing
your own vector (i.e. resize the array by yourself), is
troublesome. C++ STL has vector template for you.
#include
<stdio.h>
// this is where vector implementation resides
#include <vector>
// C++ STL built-in algorithms, I want to show sorting
// here, but please refer to sorting section for more
// details
#include <algorithm>
// use this to avoid specifying "std::" everywhere
using namespace std;
void main() {
// just do this, write vector<the type you want,
// in this case, integer> and the vector name
vector<int> v;
// try inserting 7 different integers, not ordered
v.push_back(3); v.push_back(1); v.push_back(2);
v.push_back(7); v.push_back(6); v.push_back(5);
v.push_back(4);
// to access the element, you need an iterator...
vector<int>::iterator i;
printf("Unsorted version\n");
// start with 'begin', end with 'end', advance with i++
for (i = v.begin(); i!= v.end(); i++)
printf("%d ",*i); // iterator's pointer hold the value
printf("\n");
sort(v.begin(),v.end()); // default sort, ascending
printf("Sorted version\n");
for (i = v.begin(); i!= v.end(); i++)
printf("%d ",*i); // iterator's pointer hold the value
printf("\n");
}
|
3.
Linked List
Motivation for using linked list: Array is static and even though it has
O(1) access time if index is known, Array must shift its elements if an
item is going to be inserted or deleted and this is absolutely inefficient.
Array cannot be resized if it is full (see resizable array - vector for
the trick but slow resizable array).
Linked list can have a very fast insertion and deletion. The physical location
of data in linked list can be anywhere in the memory but each node must
know which part in memory is the next item after them.
Linked list can be any big as you wish as long there is sufficient memory.
The side effect of Linked list is there will be wasted memory when a node
is "deleted" (only flagged as deleted). This wasted memory will only be
freed up when garbage collector doing its action (depends on compiler /
Operating System used).
Linked list can be visualized like this:
head
tail
|
|
v
v
node1 -> node2 -> .... -> node N -> null
Each node consists of
two parts:
node: [item/value/content | pointer(s) to the neighbor]
Linked list is a data structure that is commonly used because of it's dynamic
feature (or because your programming assignment force you to use this :-p...
). Linked list is not used for fun, it's very complicated and have a tendency
to create run time memory access error). Some programming languages such
as Java and C++ actually support Linked list implementation through API
(Application Programming Interface) and STL (Standard Template Library).
When your programming language support ADT List, USE IT!! Don't
re-invent another linked list on your own to avoid unnecessary bugs...
Linked list is composed of a data (and sometimes pointer to the data) and
a pointer to next item. In Linked list, you can only find an item through
complete search from head until it found the item or until tail (not found).
This is the bad side for Linked list, especially for a very long list. And
for insertion in Ordered Linked List, we have to search for appropriate
place using Complete Search method, and this is slow too. (There are some
tricks to improve searching in Linked List, such as remembering references
to specific nodes, or using self-organizing heuristics: moving the
recently searched items to the front of the list for quick re-searching
later, etc).
Variations of Linked List:
With tail
pointer
Instead of standard head pointer, we use another pointer to keep track the
last item. This is useful for queue-like structures since in Queue, we enter
Queue from rear (tail) and delete item from the front (head).
head
|
v
node1 -> node2 -> .... -> node N -> null
With dummy
node (sentinels)
This variation is to simplify our code (I prefer this way), It can simplify
empty list code and inserting to the front of the list code.
Doubly Linked
List
Efficient if we need to traverse the list in both direction (forward and
backward). Each node now have 2 pointers, one point to the next item, the
other one point to the previous item. We need dummy head & dummy tail for
this type of linked list.
head
|
v
node1 <-> node2 <-> .... <-> node N <-> null
Circular Linked
List
Useful if you need to cycle through a list repeatedly, e.g. round robin
system for a shared resource. Have your last node point to the first
node, then It can be visualized as a circular linked list.
For circular Linked List with tail pointer, usually we remove the head
pointer because now the head = tail->next.
tail
|
v
node1 -> node2 -> .... -> node N
^
|
--------------------------
|
TIPS: UTILIZING C++ LIST STL
A demo on
the usage of STL list. The underlying data structure is
a doubly link list, see (link_list_stl.cpp).
#include
<stdio.h>
// this is where doubly linked list implementation resides
#include <list>
// use this to avoid specifying "std::" everywhere
using namespace std;
// just do this, write list<the type you want,
// in this case, integer> and the list name
list<int> l;
list<int>::iterator i;
void print() {
for (i = l.begin(); i != l.end(); i++)
printf("%d ",*i); // remember... use pointer!!!
printf("\n");
}
void main() {
// try inserting 8 different integers, has duplicates
l.push_back(3); l.push_back(1); l.push_back(2);
l.push_back(7); l.push_back(6); l.push_back(5);
l.push_back(4); l.push_back(7);
print();
l.sort(); // sort the list, wow sorting linked list...
print();
l.remove(3); // remove element '3' from the list
print();
l.unique(); // remove duplicates in SORTED list!!!
print();
i = l.begin(); // set iterator to head of the list
i++; // 2nd node of the list
l.insert(i,1,10); // insert 1 copy of '10' here
print();
}
|
4.
Stack
A data structures which only allow insertion (push) and deletion
(pop) from the top only. This behavior is called Last In First
Out (LIFO), similar to normal stack in the real world.
Important stack operations:
1. Push (C++ STL: push())
Adds new item at the top of the stack.
2. Pop (C++ STL: pop())
Retrieves and removes the top of a stack.
3. Peek (C++ STL: top())
Retrieves the top of a stack without deleting it.
4. IsEmpty (C++ STL: empty())
Determines whether a stack is empty.
Some stack applications:
1. To model "real stack" in computer world: Recursion, Procedure Calling,
etc.
2. Checking palindrome (although checking palindrome using Queue & Stack
is 'stupid').
3. To read an input from keyboard in text editing with backspace key.
4. To reverse input data, (another stupid idea to use stack for reversing
data).
5. Checking balanced parentheses.
6. Postfix calculation.
7. Converting mathematical expressions. Prefix, Infix, or Postfix.
Some stack implementations:
1. Linked List with head pointer only (Best)
2. Array
3. Resizeable Array
|
TIPS: UTILIZING C++ STACK STL
Stack is not
difficult to implement, but why should we reinvent the
wheel?
Stack STL's implementation is very efficient, even though
it will be slightly slower than your custom made stack.
But who bothers with small efficiency if you save a lot
of time during contest by NOT implementing your own
stack (stack_stl.cpp)
#include
<stdio.h>
// this is where stack implementation resides
#include <stack>
// use this to avoid specifying "std::" everywhere
using namespace std;
void main() {
// just do this, write stack<the type you want,
// in this case, integer> and the stack name
stack<int> s;
// try inserting 7 different integers, not ordered
s.push(3); s.push(1); s.push(2);
s.push(7); s.push(6); s.push(5);
s.push(4);
// the item that is inserted first will come out last
// Last In First Out (LIFO) order...
while (!s.empty()) {
printf("%d ",s.top());
s.pop();
}
printf("\n");
}
|
5.
Queue
A data structures which only allow insertion from the back (rear),
and only allow deletion from the head (front). This behavior is called
First In First Out (FIFO), similar to normal queue in the real world.
Important queue operations:
1.
Enqueue (C++ STL: push())
Adds new item at the back (rear) of a queue.
2. Dequeue (C++ STL: pop())
Retrieves and removes the front of a queue at the back (rear) of a
queue.
3. Peek (C++ STL: top())
Retrieves the front of a queue without deleting it.
4. IsEmpty (C++ STL: empty())
Determines whether a queue is empty.
Some queue applications:
1. To model 'real queue' in computer world: Printer Queue, Task Queue, Process
Queue.
2. Checking palindrome (although checking palindrome using Queue & Stack
is 'stupid').
3. Simulation.
Some queue implementations:
1. Linked List
2. Array
3. Resizeable Array
4. Circular Array
|
TIPS: UTILIZING C++ QUEUE STL
Standard queue
is also not difficult to implement. Again, why trouble yourself,
just use C++ queue STL (queue_stl.cpp)
#include
<stdio.h>
// this is where queue implementation resides
#include <queue>
// use this to avoid specifying "std::" everywhere
using namespace std;
void main() {
// just do this, write queue<the type you want,
// in this case, integer> and the queue name
queue<int> q;
// try inserting 7 different integers, not ordered
q.push(3); q.push(1); q.push(2);
q.push(7); q.push(6); q.push(5);
q.push(4);
// the item that is inserted first will come out first
// First In First Out (FIFO) order...
while (!q.empty()) {
// notice that this is not "top()" !!!
printf("%d ",q.front());
q.pop();
}
printf("\n");
}
|
There is another type of Queue, the Priority Queue. The difference with
original queue is that the higher priority item will go out from queue sooner.
Refer to advanced data structure
(heap) for more details.
What's next?
With these few basic data
structures, you will be able to solve majority of contest problems. Even
though advanced data structures (will be covered later) will surely helpful
for more complex problems, you don't need it now. Let's see some basic problem
solving techniques in the next chapter. Stay tuned. :D
This document, 2_basic_data_structures.html, has been accessed 26519 times since 02-Dec-01 16:45:55 SGT.
This is the 12th time it has been accessed today.
A total of 17333 different hosts have accessed this document in the
last 2558 days; your host, 38.103.63.56, has accessed it 1 times.
If you're interested, complete statistics for
this document are also available, including breakdowns by top-level
domain, host name, and date.
|