|
Last updated on:
10 November 2008 06:01:29 PM
1.
Introduction to Sorting
2. Comparison-based sorting
algorithms
a. Bubble Sort
b. Insertion Sort
c. Selection/Minimal/Maximal
Sort
d. Merge Sort
e. Quick Sort and tips & trick using C/C++
qsort library
f. Heap Sort
3. Linear-time Sorting
a. Lower
bound of comparison-based sort is
W(n
log n)
b. Counting Sort
c. Radix Sort
d. Bucket Sort
References:
-. CLRS chapter 1 [Insertion Sort,Merge
Sort],
-. CLRS Chapter 6 [Heapsort],
-. CLRS Chapter 7 [Quicksort],
-. CLRS Chapter 8 [Sorting in Linear Time]
1.
Introduction to Sorting
Definition
of a sorting problem:
Input:
A sequence of N numbers (a1,a2,...,aN)
Output:
A permutation (a1',a2',...,aN') of the
input sequence such that a1' <= a2' <=... <= aN'
Sorting is
very important in computer science, and of course, in the context of this
website, very important for programming contests. Why? because sorted items
are much more easier to be searched. That's why these two terms, "sorting"
& "searching" usually come together. After sort, then search..., or formally
said, sorting is commonly used as the initialization step for another algorithm.
Things to
be considered in Sorting
These are
the difficulties in sorting that can also happen in real life:
Size
of the list to be ordered
is the main concern. Sometimes, the computer memory is not sufficient to
store all data. You may only be able to hold part of the data inside the
computer at any time, the rest will probably have to stay on disc or tape.
This is known as the problem of external sorting.
However, rest assured, that almost all programming contests problem size
will never be extremely big such that you need to access disc or tape to
perform external sorting... (such hardware access is usually forbidden during
contests).
Another problem
is the stability of the sorting method. Example: suppose you
are an airline. You have a list of the passengers for the day's flights.
Associated to each passenger is the number of his/her flight. You will probably
want to sort the list into alphabetical order. No problem... Then, you want
to re-sort the list by flight number so as to get lists of passengers for
each flight. Again, "no problem"... - except that it would be very nice
if, for each flight list, the names were still in alphabetical order. This
is the problem of stable sorting.
To be a bit
more mathematical about it, suppose we have a list of items {xi}
with xa equal to xb as far as the sorting comparison
is concerned and with xa before xb in the list. The
sorting method is stable if xa is sure to come before xb
in the sorted list.
Finally, we
have the problem of key sorting. The individual items to be
sorted might be very large objects (e.g. complicated record cards). All
sorting methods naturally involve a lot of moving around of the things being
sorted. If the things are very large this might take up a lot of computing
time -- much more than that taken just to switch two integers in an array.
In this kind of case the best approach is the leave the actual data alone
and instead work with a list of 'markers' for the data. You shuffle the
markers, not the data itself. If you really want to, you can reorder the
actual data after you have determined its ordering.
2.
Comparison-based sorting algorithms
Comparison-based sorting algorithms
involves comparison between two object a and b to determine one of the three
possible relationship between them: less than, equal, or greater than. These
sorting algorithms are dealing with how to use this comparison effectively,
so that we minimize the amount of such comparison. Lets start from the most
naive version to the most sophisticated comparison-based sorting algorithms.
Note: I assume
that we will use '0' as the first index of an array, not '1' !!!
a.
Bubble Sort
Speed: O(n^2),
extremely slow
Space: The size of initial array
Coding Complexity: Simple
This is the
simplest and (unfortunately) the worst sorting algorithm. This sort will
do double pass on the array and swap 2 values when necessary.
Bubble sort
- basic idea
Repeatedly
sweep through array, exchanging adjacent pairs of elements that are out
of order.
Bubble sort
pseudo code (Note: there are some variant of the pseudo code given below,
however, the implementation detail is up to you).
BubbleSort(A)
for i <- length[A]-1 down to 1
for j <- 0 to i-1
if (A[j] > A[j+1]) // change ">" to "<" to do a descending
sort
temp <- A[j]
A[j] <- A[j+1]
A[j+1] <- temp
Slow motion
run of Bubble Sort (Bold == sorted region):
5 2 3 1 4
2 3 1 4 5
2 1 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5 >> done
b.
Insertion Sort
Speed: O(n^2),
extremely slow, but slightly faster than bubble sort in partially sorted
array
Space: The size of initial array
Coding Complexity: Simple
Insertion
sort - basic idea
Maintain "sorted
region" at beginning of array.
Repeatedly "grow" sorted region by inserting first element from unsorted
region into array.
This algorithm
will run better in a partially sorted array, because it doesn't do swapping
if the elements are already partially sorted.
Insertion
sort pseudo code:
InsertionSort(A)
for j <- 1 to length[A]-1
key <- A[j] // insert A[j] into the sorted sequence A[1..j-1]
i <- j - 1
while (i>=0 and A[i]>key)
A[i+1] <- A[i]
i <- i - 1
A[i+1] <- key
Insertion
sort in JAVA:
static void InsertionSort() {
for (int j=1;j<=totalElements-1;j++) {
int key = A[j]; // insert A[j] into the sorted sequence A[1..j-1]
int i = j - 1;
while (i>=0 && A[i]>key) {
A[i+1] = A[i];
i = i - 1;
}
A[i+1] = key;
}
Slow motion
run of Insertion Sort (Bold == position of index j):
5 2
4 6 1 3
2 5 4 6 1 3
2 4 5 6 1 3
2 4 5 6 1 3
1 2 4 5 6 3
1 2 3 4 5 6 >> done
c.
Selection/Minimal/Maximal Sort
Speed: O(n^2),
extremely slow
Space: The size of initial array
Coding Complexity: Simple
Selection
sort - basic idea
Move the smallest
element to a[0],
Move the second smallest to a[1],
…
Move the (n-1)st smallest to a[n-2].
Another approach in sorting, yet it is still
a slow O(n^2) algorithm.
Selection
sort pseudo code:
SelectionSort(A)
for i <- length[A]-1 downto 0
for j <- 0 to i
if (A[j]>MAX) // for descending, change this to MIN
MAX=A[j]
MAX_ID=j
temp=A[i]
A[i]=A[MAX_ID]
A[MAX_ID]=temp
Slow motion
run of Selection Sort (Bold == sorted region):
5 1 3 2 4
4 1 3 2 5
1 3 2 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5 >> done
d.
Merge Sort
Speed: O(n
Log n), similar speed with Quick Sort, pure n log n algorithm
Space: 2*the size of initial array
Coding Complexity: Complex, using Divide & Conquer approach
Merge Sort - basic idea
Sort first
half of the array.
Sort second half of the array.
Merge the result.
The bad side
of this sorting algorithm is this algorithm need 2*the size of initial array
to perform it's task. This can be very inefficient from memory-Point of
View.
Slow motion run of Merge Sort
(Bold
== merge step - where the actual sorting happen):
Original Array: 5 1 3 2 4
LHS: 5 1 3 -> LHS: 5 1 -> LHS: 5 ->
MERGE: 1 5
\> RHS: 1 / \
\> RHS: 3 --------------------->
MERGE: 1 3 5
RHS: 2 4 -> LHS: 2
\
\> RHS: 4 -> MERGE: 2 4 -------------------------->
MERGE: 1 2 3 4 5
Note: Merge Sort can be slightly modified to count inversion index (i.e.
bubble sort swaps) in O(n log n) time. When the right partition goes in
first in Merge step, that means all the remaining items in left partition
< first element in right partition (i.e. you have to swap with all of them).
e.
Quick Sort
Speed: O(n
log n), one of the best sorting algorithm.
Space: The size of initial array
Coding Complexity: Complex, using Divide & Conquer approach
One of the
best sorting algorithm known. Quick sort use Divide & Conquer approach and
partition each subset. Partitioning a set is to divide a set into a collection
of mutually disjoint sets. This sort is much quicker compared to "stupid
but simple" bubble sort. Quick sort was invented by C.A.R Hoare. A good
text regarding Quick Sort can be found in [CLR],
chapter 8.
Quick Sort - basic idea
Partition
the array in O(n)
Recursively sort left array in O(log2 n) best/average case
Recursively sort right array in O(log2
n) best/average case
Quick Sort facts:
Worst case
time: O(n^2).
Average case (if all orderings equally likely): O(n log n).
Average case (if random pivot, worst-case input): O(n log n).
Fast in practice due to in-site sorting unlike merge sort.
There are many pivot selection algorithm for Quick Sort (different execution
time).
Quick sort
pseudo code:
QuickSort(A,p,r)
if p < r
q <- Partition(A,p,r)
QuickSort(A,p,q)
QuickSort(A,q+1,r)
Quick sort
in JAVA, using Old Partition Algorithm:
static int Partition(int l,int r) {
int x = A[l]; // pivot
int i = l-1; // set outside boundary first
int j = r+1; // set outside boundary first
while (true) {
while (true) {
j--;
if (A[j] <= x) break;
}
while (true) {
i++;
if (A[i] >= x) break;
}
if (i<j) {
int temp=A[i];
A[i]=A[j];
A[j]=temp;
}
else
return j;
}
}
static void QuickSort(int l,int r) {
if (l < r) {
int q = Partition(l,r);
QuickSort(l,q);
QuickSort(q+1,r);
}
}
Slow motion run of Quick Sort, using standard pivot method (select the first item as pivot):
4
7 1 2 3 10 8 - Pivot=4
3 7 1 2 4 10 8
3 2 1 7 4 10 8 -> first pass of Partition, split at 1
3
2 1 - Pivot=3
7 4 10 8 - Pivot=7
1 2 3 - split at 2
4 7 10 8 - split at 7
1
2 - Pivot=1
3-base case 4-base case 7 10 8 - Pivot
7
1 2 - split at 1
7 10 8 - Pivot 10
1-2-base case
7-base case 10
8 - Pivot 10
8 10 - split at 8
8-10-base case
Final result: 1 2 3 4 7 8 10
|
Quick Sort for C/C++ User
C/C++ standard library
<stdlib.h> contains qsort function.
This is not the best
quick sort implementation in the world but it fast enough and VERY
EASY to be used... therefore if you are using C/C++ and need to
sort something, you can simply call this built in function:
qsort(<arrayname>,<size>,sizeof(<elementsize>),compare_function);
The only thing that you need to implement is the compare_function,
which takes in two arguments of type "const void", which can be
cast to appropriate data structure, and then return one of these
three values:
negative, if a should
be before b
0, if a equal to b
positive, if a should be after b
With these three possibilities,
you can virtually compare anything, starting from simply sort a
list of integers, sort a list of strings, to a more complex stuff
such as sorting floating point numbers, sorting records based on
key, and finally... multi field sorting :)... a very advanced technique
for sorting in my opinion :)
1. Comparing a
list of
integers
simply cast a
and b to integers
if x < y, x-y is negative, x == y, x-y = 0, x > y, x-y is positive
x-y is a shortcut way to do it :)
reverse *x - *y to *y - *x for sorting in decreasing order
int
compare_function(const void *a,const void *b) {
int *x = (int *) a;
int *y = (int *) b;
return *x - *y;
}
2. Comparing a
list of strings
for comparing
string, you need strcmp function inside string.h lib.
strcmp will by default return -ve,0,ve appropriately... to sort
in reverse order, just reverse the sign returned by strcmp
#include <string.h>
int compare_function(const void *a,const void *b) {
return (strcmp((char *)a,(char *)b));
}
3. Comparing floating
point numbers
In comparing floating
points, you must be careful. You can't simply use the technique
listed in number 1 above, why? because compare_function returns
an integer. Simply substracting two floating points and cast
them as integer will causes precision errors. For example: 0.5
and 0.6, if you simply use number 1 technique above, you'll
return 0.5 - 0.6 which is -0.1, which when converted to integer
yields 0... wrong answer rite...
int compare_function(const
void *a,const void *b) {
double *x = (double *) a;
double *y = (double *) b;
// return *x - *y; // this is WRONG...
if (*x < *y) return -1;
else if (*x > *y) return 1;
return 0;
}
4. Comparing records
based on a key
Sometimes you
need to sort a more complex stuffs, such as record. Here is
the simplest way to do it using qsort library
typedef struct
{
int key;
double value;
} the_record;
int compare_function(const
void *a,const void *b) {
the_record *x = (the_record *) a;
the_record *y = (the_record *) b;
return x->key - y->key;
}
5. Multi field
sorting, advanced sorting technique :)
Sometimes sorting
is not based on one key only... For example sorting birthday
list... First you sort by month, then if the month ties, sort
by date (obviously), then finally by year...
For example I
have an unsorted birthday list like this:
24 - 05 - 1982
- Steven
24 - 05 - 1980 - Cecilia Cheung
31 - 12 - 1999 - End of 20th century
01 - 01 - 0001 - Start of modern calendar
I will have a
sorted list like this:
01 - 01 - 0001
- Start of modern calendar
24 - 05 - 1980 - Cecilia Cheung
24 - 05 - 1982 - Steven
31 - 12 - 1999 - End of 20th century
To do multi field
sorting like this, traditionally one will choose multiple sort
using sorting algorithm which has "stable-sort" property. This
is outdated...
The better way
to do multi field sorting is to modify the compare_function
in such a way that you break ties accordingly... I'll give you
an example using birthday list again.
typedef struct
{
int day,month,year;
char *name;
} birthday;
int compare_function(const
void *a,const void *b) {
birthday *x = (birthday *) a;
birthday *y = (birthday *) b;
if (x->month != y->month) // months different
return x->month - y->month; // sort by month
else { // months equal..., try the 2nd field... day
if (x->day != y->day) // days different
return x->day - y->day; // sort by day
else // days equal, try the 3rd field... year
return x->year - y->year; // sort by year
}
}
TEST YOUR MULTI
FIELD SORTING KNOWLEDGE
Solve UVa
problems related with multi field sorting:
10194 - Football (aka Soccer)
|
Note about Partition:
Partition is the heart of Quick Sort algorithm.
In [CLR], we've learnt about the first Partition algorithm used by C.A.R.
Hoare when he first invented this algorithm. This partition algorithm is
still O(N) but not really cache friendly, since it start swapping from leftmost
& rightmost.
In [CLRS], the 2nd edition of the book, we
are introduced with a better way to do Partition. First, we select an element
as a pivot (black), then we grow blue and red region such that all elements
in blue <= pivot and all elements in red >= pivot

Source: CS3230 Lecture Note 2
Pseudo code:
1). Set i = p-1 and j = p
2). Increment j until A[j] <= pivot
3). Increment i; swap A[i] and A[j]
4). Repeat steps 2) to 3) until j = r
5). Swap A[i+1] and A[r]
6). Return i+1 // this is q - final position of pivot
This variant of Partition algorithm is considered
better because it is more cache friendly by maximizing locality, since when
an element is in the wrong partition, it will be swapped with the nearest
element that is also in the wrong partition. i.e., this Partition algorithm
grows blue and red region from left, where in previous algorithm, the red
partition is located on the rightmost. The new one also easier to analyze
since it only uses one comparison function in line 2.
f.
Heap Sort
Speed: O(N*log
N)
Space: The size of initial array
Coding Complexity: Complex, using advanced data structure, a Heap.
Refer to
Data Structures section regarding
Heap.
Heap Sort - basic idea
Build
the Heap from array A
We know the root of the Heap is the highest value, swap with the last element
Now we are sure that the last element has the highest value
Then we decrease the heap size by 1
And do Heapify in O(log N) without considering the last element
Repeat this process until Heap size become 1.
Quick sort
pseudo code:
HeapSort(A)
Build-Heap(A)
for i <- length(A) downto 2
exchange(A[1],A[i])
heap-size[A]--
Heapify(A,1)
Heap Sort sample run
Trace the following Heap step by step
Initial, unsorted array A={5,13,2,25,7,17,20,8,4}
5
13 2
25 7 17 20
8 4
After Build-Heap
25
13 20
8 7 17 2
5 4
* denotes the sorted region
Pass 1
4
13 20
8 7 17 2
5 25*
Pass 2
5
13 17
8 7 4 2
20* 25*
Pass 3
2
13 5
8 7 4 17*
20* 25*
Pass 4
4
8 5
2 7 13* 17*
20* 25*
Pass 5
4
7 5
2 8* 13* 17*
20* 25*
Pass 6
2
4 5
7* 8* 13* 17*
20* 25*
Pass 7
2
4 5*
7* 8* 13* 17*
20* 25*
Pass 8 - Finish
2
4* 5*
7* 8* 13* 17*
20* 25*
Sorted array = {2,4,5,7,8,13,17,20,25}
3.
Linear-time
Sorting
a.
Lower bound
of comparison-based sort is
W(n log
n)
The sorting algorithms that we see above are
comparison-based sort, they use comparison function such as <, <=, =, >,
>=, etc to compare 2 elements. We can model this comparison sort using decision
tree model, and we can proof that the shortest height of this tree is
W(n
log n). Details can be found in
CLRS chapter 8.
In the next section, we will see three
non-comparison based sorting algorithm which able to runs in
linear time for special cases.
b. Counting Sort
For Counting Sort, we assume that the numbers
are in the range [0..k], where k is at most O(n). We set up a counter array
which counts how many duplicates inside the input, and the reorder the output
accordingly, without any comparison at all. Complexity is O(n+k).
Refer to CLRS chapter 8 for more details.
c. Radix Sort
For Radix Sort, we assume that the input are
n d-digits number, where d is
reasonably limited.
Radix Sort will then sorts these number digit
by digit, starting with the least significant digit to the most significant
digit. It usually use a stable sort algorithm to sort the digits, such as
Counting Sort above.
Example:
input:
321
257
113
622
sort by third (last) digit:
321
622
113
257
after this phase, the third (last) digit is sorted.
sort by second digit:
113
321
622
257
after this phase, the second and third (last) digit are sorted.
sort by second digit:
113
257
321
622
after this phase, all digits are sorted.
For a set of n d-digits
numbers, we will do d pass of counting sort which have complexity
O(n+k), therefore, the complexity of Radix Sort is O(d(n+k)).
However, we need to balance the n and d factor wisely. Refer to CLRS chapter
8 for more details.
d. Bucket Sort
For Bucket Sort, we assume that the input
distribution is in the range [0,1). Set up n buckets that
contain range [i/n..i+1/n) for each i = 0 to n. Then do a linear scan through
the input and put each elements to appropriate buckets. If the input are
uniformly distributed, we can expect that each bucket will hold very small
number of elements, which can be sorted easily using any sorting algorithm,
say, Insertion Sort. The overall complexity will be just O(n)
in this case. Refer to CLRS chapter 8 for more details.
What's Next?
You now know how to do sorting, no matter
how complex the things that you need to sort is (using C qsort library).
Now you'll learn the complement of sorting, which is searching, at the next
chapter.
This document, 5_sorting.html, has been accessed 20468 times since 28-Feb-28 23:35:10 SGT.
This is the 22nd time it has been accessed today.
A total of 13202 different hosts have accessed this document in the
last -7026 days; your host, 38.103.63.56, has accessed it 1 times.
If you're interested, complete statistics for
this document are also available, including breakdowns by top-level
domain, host name, and date.
|