Combinatorial and Graph Algorithms
School of Computing, National University of Singapore
(Fall Semester 2002)

LEDA Assignment 1: Hello, LEDA!
(Due: 26 September 2002)

(Updated: 5-Sept, changed in RED)


Aims of this LEDA Assignment

In part (a), your task is to compile and run a given LEDA program that implements Dijkstra's algorithm for computing shortest path in a graph. The purpose of this excercise is to give you a feel of how your choice of data structures may affect the performances of Dijkstra's algorithm. No coding is involved, unless you choose to do so on your own accord.

In part (b), you are required to complete an "almost completed" LEDA program. This is to give you some hands-on LEDA programming, as well as to demonstrate how algorithms can be quickly prototyped by using LEDA.

All the files needed for this assignment can be found in this archive.


Part (a)

In this assignment, you are given a LEDA program LEDAdijkstra.cc (also an automated version, LEDAdijkstra2.cc) that contains different implementations of Dijkstra's algorithm for shortest path in a graph. In particular, this program will first create a random directed weighted graph G, and then call the LEDA default Dijkstra's algorithm to solve the one-to-all shortest path problem G. Following that, it allows the user to choose a different implementation of Dijkstra's algorithm among seven implementations -- 0:f_heap 1:k_heap 2:m_heap 3:list_pq 4:p_heap 5:r_heap 6:bin_heap.

Compile LEDAdijkstra (make LEDAdijkstra) and run it a few times to get familiar with the interface of the program. After you get a hang of it, compile the automated version of this program (make LEDAdijkstra2) for your performance evaluation. Besides being non-interactive, the automated version also generates 100 graphs so that an averaged value for each data structure can be obtained.

Your first job is to compare the performance of Dijkstra's algorithm on graph of different sizes -- n=2000, 4000, 8000, 16000, 32000. For each n, run it on different edge sparsity, i.e., for m=5n, 10n, 15n, 20n (for all your problems, you can use the same max-edge-cost=30000). Plot out the graphs of running time against the different graph sizes.

Write a brief report on your comparisons and observations. Submit your comparison graphs together with your report (you don't have to include the data tables in the report).


Part (b)

In this part, you are given a file containing an RNA (Ribo-Nucleic Acid) sequence. Given any length n, your job is to find the most repeated patterns of length n in the file. The patterns are allowed to overlap (even if they are different occurences of the same pattern) -- to illustrate, the pattern ATGCAT is repeated twice within the sequence ATGCATGCAT (i.e. at positions 1 and 5) while the pattern TGCATG occured only once in the sequence.

You are provided with an "almost completed" program, dna.cc. You should:

  1. Complete the program. If you are careful, you will find that you only need to add a few lines.
  2. Find (all) the most repeated patterns of length 8 in this file.
  3. Submit the results together with your completed program.
You need to compile and run your programs on sf3. If I need to compile and verify your program with other DNA sequences, I will be using sf3.


Optional:

Analyse the order of runtime of your program. You can assume that each call to _get_next_string() takes O(seglen) time.


Where to get help...

First, check out this page on getting help. If that does give the answer to your problem, you can send email to my former student Mr Li Shuai Cheng for help -- and if you do so, please also cc the mail to me.
(IMPT: Note that Shuai Cheng is volunteering his services and so please try not to go to him with trivial things that you can find out yourself.)


Relative performance information about different heap implementation in LEDA

LEDA provides many implementations of priority queues. The implementations include Fibonacci heaps (f_heap), pairing heaps (p_heap), k-nary heaps (k_heap) and binary heaps (bin_heap), redistributive heaps (r_heap), lists (list_pq) and buckets (b_heap) [See Note], and monotone heaps (m_heap). Fibonacci heaps are the default implementation and other implementations acan be selected using the implementation parameter mechanism. Following is a list of related biography


Note: In the list implementation the items of the queue are stored as an unordered list. This makes delete_min and find_min linear time process (linear serach through the entire list) and trivalizes all other operations. In the bucket implementation we have an array of linear lists; the list with index i contains all items whose priority is equal to i. This scheme requires the priorities to be integers from a prespecified range.


LEDA Assignment 1