Foundations in Algorithms
School of Computing, National University of Singapore
(Fall 2008)

LEDA Assignment 1: Hello, LEDA!


Aims of this LEDA Assignment

In part (a), your task is to compile and run a given LEDA program that implements Dijkstra's algorithm for computing shortest path in a graph. The purpose of this exercise is to give you a feel of how your choice of data structures may affect the performances of Dijkstra's algorithm. No coding is involved, unless you choose to do so on your own accord.

In part (b), you are required to complete an "almost completed" LEDA program. This is to give you some hands-on LEDA programming, as well as to demonstrate how algorithms can be quickly prototyped by using LEDA. You will need to look up the LEDA manual on some the data types used on the program.

All the files needed for this assignment can be found in this leda1_fixed.tar.gz (updated) archive.


Part (a)

In this assignment, you are given a LEDA program LEDAdijkstra.cpp (updated) (also an automated version, LEDAdijkstra2.cpp (updated) ) that contains different implementations of Dijkstra's algorithm for shortest path in a graph. In particular, this program will first create a random directed weighted graph G, and then call the LEDA default Dijkstra's algorithm to solve the one-to-all shortest path problem G. Following that, it allows the user to choose a different implementation of Dijkstra's algorithm among seven implementations -- 0:f_heap 1:k_heap 2:m_heap 3:list_pq 4:p_heap 5:r_heap 6:bin_heap.

Compile LEDAdijkstra (make LEDAdijkstra) and run it a few times to get familiar with the interface of the program. After you get a hang of it, compile the automated version of this program (make LEDAdijkstra2) for your performance evaluation. Besides being non-interactive, the automated version also generates 100 graphs so that an averaged value for each data structure can be obtained.

Your first job is to compare the performance of Dijkstra's algorithm on graph of different sizes -- n=2000, 4000, 8000, 16000, 32000. For each n, run it on different edge sparsity, i.e., for m=10n, 15n, 20n, 25n, 30n (for all your problems, you can use the same max-edge-cost M=1,000,000). Plot out the graphs of running time against the different graph sizes.

Write a brief report on your comparisons and observations. Submit your comparison graphs together with your report (you don't have to include the data tables in the report).

[Note: Part of your grade will also be one how well organized your report is, how well you have done your comparisons, and your comments and conclusions drawn from your comparative study.]


Part (b)

In this part, you are given a file containing an DNA (Deoxy-ribo-Nucleic Acid) sequence. Given any length n, your job is to find the most repeated patterns of length n in the file. The patterns are allowed to overlap (even if they are different occurences of the same pattern) -- to illustrate, the pattern ATGCAT is repeated twice within the sequence ATGCATGCAT (i.e. at positions 1 and 5) while the pattern TGCATG occured only once in the sequence.

You are provided with an "almost completed" program, dna.cpp. You should:

  1. Complete the program. If you are careful, you will find that you only need to add a few lines.
  2. Find (all) the most repeated patterns of length 8 in the test input file m52dna.fas.
  3. Submit the results together with your completed program.
You need to compile and run your programs on sunfire. (For grading purposes, I will also be using sunfire to compile and run your program with other DNA sequences.)

Optional:
Analyse the order of runtime of your program. You can assume that each call to _get_next_string() takes O(seglen) time.


Your Submission for this Assignment

For part (a), the deliverable is the report, please call your report LEDA-dijsktra-rep-[your-name] followed by .doc/pdf/ps/etc.
For part (b), your deliverables should include source code, object code, results of your runs, and a short report to be named LEDA-dna-[your-name].

Please put all these deliverables for this assignment in one folder/directory called CS5206-LEDA-1-[your-name]. Then zip it up into a file called CS5206-LEDA-1-[your-name].zip.

Submit this zip file to CS5206 IVLE workbin called "CS5206-LEDA-1".


Where to get help...

First, check out this page on getting help. If that does give the answer to your problem, you can post your questions on the CS5206 IVLE forum.


Relative performance information about different heap implementation in LEDA

LEDA provides many implementations of priority queues. The implementations include Fibonacci heaps (f_heap), pairing heaps (p_heap), k-nary heaps (k_heap) and binary heaps (bin_heap), redistributive heaps (r_heap), lists (list_pq) and buckets (b_heap) [See Note], and monotone heaps (m_heap). Fibonacci heaps are the default implementation and other implementations acan be selected using the implementation parameter mechanism. Following is a list of related biography


Note: In the list implementation the items of the queue are stored as an unordered list. This makes delete_min and find_min linear time process (linear search through the entire list) and trivalizes all other operations. In the bucket implementation we have an array of linear lists; the list with index i contains all items whose priority is equal to i. This scheme requires the priorities to be integers from a prespecified range.


LEDA Assignment (Updated: Aug 2008)