NUS Home


Last updated on: 10 November 2008 06:01:21 PM

Go to Programming Section

Contents:

1. Introduction to Graph
2. Searching a Graph
3. Finding Connected Components (Flood Fill)
4. Minimum Spanning Trees (MST)
 - Tree
 - MST Kruskal
 - MST Prim
5. Single-Source Shortest Paths
6. All-Pairs Shortest Paths
7. Advanced Algorithms

4.1. Tree

Tree is one of the most efficient data structure used in a computer program. There are many types of tree. Binary tree is a tree that always have two branches, Red-Black-Trees, 2-3-4 Trees, AVL Trees, etc. A well balanced tree can be used to design a good searching algorithm.

A Tree is an undirected graph that contains no cycles and is connected.

Many trees are what is called rooted, where there is a notion of the "top" node, which is called the root. Thus, each node has one parent, which is the adjacent node which is closer to the root, and may have any number of children, which are the rest of the nodes adjacent to it. The tree above was drawn as a rooted tree.

4.2 Minimum Spanning Trees

Taken from http://www.ics.uci.edu/~eppstein/161/960206.html, with some modification & simplification.

4.2.1. Spanning trees

A spanning tree of a graph is just a subgraph that contains all the vertices and is a tree. A graph may have many spanning trees; for instance the complete graph on four vertices

    o---o
    |\ /|
    | X |
    |/ \|
    o---o

has sixteen spanning trees:

    o---o    o---o    o   o    o---o
    |   |    |        |   |        |
    |   |    |        |   |        |
    |   |    |        |   |        |
    o   o    o---o    o---o    o---o

    o---o    o   o    o   o    o   o
     \ /     |\ /      \ /      \ /|
      X      | X        X        X |
     / \     |/ \      / \      / \|
    o   o    o   o    o---o    o   o

    o   o    o---o    o   o    o---o
    |\  |       /     |  /|     \
    | \ |      /      | / |      \
    |  \|     /       |/  |       \
    o   o    o---o    o   o    o---o

    o---o    o   o    o   o    o---o
    |\       |  /      \  |       /|
    | \      | /        \ |      / |
    |  \     |/          \|     /  |
    o   o    o---o    o---o    o   o

4.2.2. Minimum spanning trees

Now suppose the edges of the graph have weights or lengths. The weight of a tree is just the sum of weights of its edges. Obviously, different trees have different lengths. The problem: how to find the minimum length spanning tree?

Why minimum spanning trees?

The standard application is to a problem like phone network design. You have a business with several offices; you want to lease phone lines to connect them up with each other; and the phone company charges different amounts of money to connect different pairs of cities. You want a set of lines that connects all your offices with a minimum total cost. It should be a spanning tree, since if a network isn't a tree you can always remove some edges and save money.

A less obvious application is that the minimum spanning tree can be used to approximately solve the traveling salesman problem. A convenient formal way of defining this problem is to find the shortest path that visits each point at least once.

Note that if you have a path visiting all points exactly once, it's a special kind of tree. For instance in the example above, twelve of sixteen spanning trees are actually paths. If you have a path visiting some vertices more than once, you can always drop some edges to get a tree. So in general the MST weight is less than the TSP weight, because it's a minimization over a strictly larger set.

On the other hand, if you draw a path tracing around the minimum spanning tree, you trace each edge twice and visit all points, so the TSP weight is less than twice the MST weight. Therefore this tour is within a factor of two of optimal.

How to find minimum spanning tree?

The stupid method is to list all spanning trees, and find minimum of list. We already know how to find minima... But there are far too many trees for this to be efficient. It's also not really an algorithm, because you'd still need to know how to list all the trees.

A better idea is to find some key property of the MST that lets us be sure that some edge is part of it, and use this property to build up the MST one edge at a time.

For simplicity, we assume that there is a unique minimum spanning tree. You can get ideas like this to work without this assumption but it becomes harder to state your theorems or write your algorithms precisely.

Lemma: Let X be any subset of the vertices of G, and let edge e be the smallest edge connecting X to G-X. Then e is part of the minimum spanning tree.

Proof: Suppose you have a tree T not containing e; then I want to show that T is not the MST. Let e=(u,v), with u in X and v not in X. Then because T is a spanning tree it contains a unique path from u to v, which together with e forms a cycle in G. This path has to include another edge f connecting X to G-X. T+e-f is another spanning tree (it has the same number of edges, and remains connected since you can replace any path containing f by one going the other way around the cycle). It has smaller weight than t since e has smaller weight than f. So T was not minimum, which is what we wanted to prove.

4.2.3. Kruskal's algorithm

We'll start with Kruskal's algorithm, which is easiest to understand and probably the best one for solving problems by hand.

    Kruskal's algorithm:
    sort the edges of G in increasing order by length
    keep a subgraph S of G, initially empty
    for each edge e in sorted order
        if the endpoints of e are disconnected in S
        add e to S
    return S

Note that, whenever you add an edge (u,v), it's always the smallest connecting the part of S reachable from u with the rest of G, so by the lemma it must be part of the MST.

This algorithm is known as a greedy algorithm, because it chooses at each step the cheapest edge to add to S. You should be very careful when trying to use greedy algorithms to solve other problems, since it usually doesn't work. E.g. if you want to find a shortest path from a to b, it might be a bad idea to keep taking the shortest edges. The greedy idea only works in Kruskal's algorithm because of the key property we proved.

Analysis: The line testing whether two endpoints are disconnected looks like it should be slow (linear time per iteration, or O(mn) total). The slowest part turns out to be the sorting step, which takes O(m log n) time.

4.2.4. Prim's algorithm

Rather than build a subgraph one edge at a time, Prim's algorithm builds a tree one vertex at a time.

    Prim's algorithm:
    let T be a single vertex x
    while (T has fewer than n vertices) {
        find the smallest edge connecting T to G-T
        add it to T
    }

Example of Prim's algorithm in C language:

/* usedp=>how many points already used
   p->array of structures, consisting x,y,& used/not used
   this problem is to get the MST of graph with n vertices
   which weight of an edge is the distance between 2 points */

usedp=p[0].used=1; /* select arbitrary point as starting point */
while (usedp<n) {
  small=-1.0;

  for (i=0;i<n;i++) if (p[i].used)
    for (j=0;j<n;j++) if (!p[j].used) {
      length=sqrt(pow(p[i].x-p[j].x,2) + pow(p[i].y-p[j].y,2));

      if (small==-1.0 || length<small) {
        small=length;
        smallp=j;
      }
    }

  minLength+=small;
  p[smallp].used=1;
  usedp++ ;
}

Since each edge added is the smallest connecting T to G-T, the lemma we proved shows that we only add edges that should be part of the MST.

Again, it looks like the loop has a slow step in it. But again, some data structures can be used to speed this up. The idea is to use a heap to remember, for each vertex, the smallest edge connecting T with that vertex.

    Prim with heaps:
    make a heap of values (vertex,edge,weight(edge))
        initially (v,-,infinity) for each vertex
        let tree T be empty
    while (T has fewer than n vertices) {
        let (v,e,weight(e)) have the smallest weight in the heap
        remove (v,e,weight(e)) from the heap
        add v and e to T
        for each edge f=(u,v)
        if u is not already in T
            find value (u,g,weight(g)) in heap
            if weight(f) < weight(g)
            replace (u,g,weight(g)) with (u,f,weight(f))
    }

Analysis: We perform n steps in which we remove the smallest element in the heap, and at most 2m steps in which we examine an edge f=(u,v). For each of those steps, we might replace a value on the heap, reducing it's weight. (You also have to find the right value on the heap, but that can be done easily enough by keeping a pointer from the vertices to the corresponding values.) I haven't described how to reduce the weight of an element of a binary heap, but it's easy to do in O(log n) time. Alternately by using a more complicated data structure known as a Fibonacci heap, you can reduce the weight of an element in constant time. The result is a total time bound of O(m + n log n).

PREVIOUS - NEXT


This document, 9_graph4.html, has been accessed 11047 times since 03-Jan-03 17:06:31 SGT. This is the 5th time it has been accessed today.

A total of 6990 different hosts have accessed this document in the last 2198 days; your host, 38.103.63.56, has accessed it 1 times.

If you're interested, complete statistics for this document are also available, including breakdowns by top-level domain, host name, and date.