CS6234 : Advanced Algorithms
Modular credits: 4
This module is aimed at graduate students who are doing or intend to do advanced research in algorithm design and analysis in all areas of computer science. The module covers advanced material on algorithms with emphasis on efficient algorithms, and explores their use in a variety of application areas. Topics covered include, but are not restricted to, linear programming, graph matching and network flows, approximation algorithms, randomized algorithms, online algorithms, local search algorithms, algorithms for large datasets. The module will be a seminar-based module that will expose students to current research in these areas.
Lectures: Every Tuesday 10-12 pm at COM1-02-13 (VC Room). First lecture on 14th January, 2014.
There will be no tutorials.
Office Hours: Every Tuesday 2-3 pm (or by appointment).
Name : Rahul Jain
Office : S15-04-01
Email: first name at comp.nus.edu.sg
Phone: 65168826 (off).
We may use parts of the following texts and also material from outside these, e.g. research papers on algorithms and related areas.
1. “Introduction to Algorithms” by Cormen, Leiserson, Rivest and Stein (Prentice Hall, Second Edition.)
2. “Algorithm Design” by Jon Kleinberg and Eva Tardos. (Pearson International Edition.)
3. “Approximation Algorithms” by Vijay. V. Vazirani. (Springer.)
4. “Randomized Algorithms” by Rajeev Motwani and Prabhakar Raghavan. (Cambridge University Press.)
5. “Computational Geometry”, by de Berg, Mark, O. Cheong, M. van Kreveld, and M. Overmars. (Third edition. New York, NY: Springer-Verlag, 2008. ISBN: 9783540779735).
6. “Data streams: Algorithms and applications”, by S. Muthukrishnan. Appears in Foundations and Trends in Theoretical Computer Science, Volume 1, issue 2, 2005.
Somebody famously said “A problem well understood is half solved.” Therefore a primary ability any researcher must develop is the ability to read and understand and absorb known body of knowledge before adding further on to it. Since this module is intended at students who wish to do research in algorithms and related areas, the primary focus of the module will not be on ‘teaching new topics in algorithms’ (this has been done in several previous courses that you have taken in your journey so far), but on ‘teaching how to learn new topics in algorithms’. Therefore the focus will be to train students in being able to read and understand advanced material. You will be expected to read, understand and present advanced topics in algorithms/ research papers in algorithms. Your grade will be determined on how well you understand and present these topics/papers. There will be no exams.
First presentation: 40 marks.
Second presentation: 40 marks.
Forum discussions: 20 marks.
‘Advanced data structures’ on 21-01-2014, presented by Muthu Kumar C., Xie Shudong, Agus Pratondo, Aleksanr Farseev, Li Furong, Song Chonggang and Hong Hande.
In this presentation the group talked about three data structures: Splay Trees, Fibonacci Heaps and Persistent Data Structures. The key focus was on amortized time analysis and the use of potential function.
Splay tree is self-optimizing, in that the frequently accessed nodes will move nearer to the root where they can be accessed more quickly. The worst-case height—though unlikely—is O(n), with the average being O(log n). The representation of splay trees can change even when they are accessed in a 'read-only' manner (i.e. by find operations). This complicates the use of such splay trees in a multi-threaded environment.
Fibonacci heap is a “lazy” version of Binomial heap and also more flexible in tree structure. It defers the consolidation job until next delete min which help to give an amortized up-bound smaller than that using Binomial heap. Similar to other amortized data structure, Fibonacci heap can take linear time to complete some operations in the worst case, which may not appropriate for real-time systems.
Persistent data structure sacrifices some extra space to gain the ability to retrieve history version of the tree or other data structure. Three methods: Fat nodes, Path copying and Node copying were introduced in the lecture. Node copying utilize a modification box to get a trade-off between space and running time. More applications on Persistent data structure were introduced in the end of the lecture.
‘PRIMES is in P’ on 28-01-2014, presented by Abha Belorkar, Akshay Narayan, Ratul Saha, Pratik Shah, Wang Shengyi, Shweta Shinde, Shruti Tople.
In this talk the group presented the AKS primality test. The AKS algorithm is the first unconditional and deterministic primality testing algorithm which runs in polynomial time.
Primality testing in this algorithm is based on the extended Fermat's theorem for primality testing. The idea is to reduce the Child's binomial theorem to degree "r" which is much less than the given number "n" and then use it for primality testing. In the presentation, the group explained the algorithm in detail and proved the existence of such a reduction factor within suitable bounds. This was followed by an overview and explanation of the correctness proof of the algorithm.
The group also evaluated the practicality of AKS algorithm against the commonly used primality tests like Miller-Rabin test and APR primality tests. The talk was concluded by stating possible improvements to AKS algorithm.
‘Compressed sensing’ on 29-01-2014, presented by Mobashir Mohammad, Aditya Kulkarni, Tobias Bertelsen, Malay Singh, Hirak Sarkar, Nirandika Wanigasekara, Yamilet Serrano Llerena, Parvathy Sudhir
In this talk the group presented Compressed Sensing, a new research area that sparked from a keen observation from Dr. D. L. Donoho who mentioned in his paper "Why go to so much of effort to acquire all the data when most of what we get will be thrown away".
Prior to Compressed Sensing, data needed to be sampled at the Nyquist Rate which lead to the growth of data from trickle to torrent, making us face the severe problem of data deluge because the Nyquist rate is so high that we end up with far too many samples. Also, it is too costly to build devices capable of acquiring samples at the necessary rate. Thus, despite of the sensors becoming faster, stronger and better, data acquisition and storage remained a challenge. Some help to solve the above mentioned problem was to compress the sampled data using transform coding. This paved way to a number of standards like JPEG, JPEG2000, MPEG, MP3, etc. However, the above process leads to first sampling millions of data points and then throwing away almost all of it.
Building on the concept of data compression, compressed sensing has emerged as a new framework for signal acquisition and sensor design. It exploits the fact that most of the signals naturally occurring are inherently sparse or compressible and thus enable a potentially large reduction in the sampling and the computation costs for sensing signals. The presentation started with the motivation behind Compressed Sensing followed by a brief introduction. It later went on to a detailed theoretical discussion and an algorithmic implementation of the recovery of the original signal from the under-sampled measured signal. To make things interesting, practical examples like Single Pixel Camera and Magnetic Resonance Imaging which exploit Compressed Sensing were discussed with sufficient details. For the audience to think over the topic, ongoing research and potential future extensions to compressed sensing were touched upon at the end with a summary of the entire talk.
‘Approximation algorithms’ on 18-02-2014, presented by Geng Xue, Cai Jingli, Xing Zhe, Zhu Xiaolu, Wang Zixiao, Jiao Qing, Zhang Hao
In this presentation, the group talked about approximation algorithms. Approximation algorithms are used to give an approximate solution to optimization problems, especially NP-hard optimization problems in polynomial time.
The group presented a specific optimization problem, the set cover problem, to analyze different kinds of approximation algorithms including combinatorial techniques and linear programming based (LP-based) techniques. Combinatorial techniques include greedy algorithms and layering algorithms, while LP-based techniques include dual fitting, rounding, and primal-dual schema. Different approximation algorithms are evaluated by factor , which gives a guarantee that the approximation solution is within times of the optimum solution . The closer the factor is to 1, the better the approximation algorithm is. The various approximation algorithms for the set cover problem achieve one of two factors: or . is the frequency of the most frequent element, and is the number of elements in the universal set. The summary of the five algorithms presented is as below:
Greedy algorithm: Factor is . It selects the most cost-efficient subset in each iteration until the universal set is covered.
Layering algorithm: For vertex cover problem, it tries to decompose arbitrary weight function into several degree-weighted functions so that factor of 2 can be achieved in each layer. It can be generalized to solve the set cover problem with an approximation factor of .
Dual fitting: An LP-based approach applied on the approximation problems to help analyze the approximation factor.
Rounding algorithm: Factor is for simple rounding and for random rounding. It is used to convert the fractional LP solution to integer solution.
Primal-dual schema: It tries to achieve a better solution iteratively by considering both primal and dual programs. It can achieve a factor of for set cover problem.
‘Geometric Algorithms’ on 19-02-1024, presented by Suman Sourav, Paramasiven Appavoo, Anuja Meetoo Appavoo, Li Jing, Lu Bingxin, Suhendry Effendy, Dumitrel Loghin
‘Parallel and Distributed Algorithms’ on 04-03-2014, presented by Shalinda Adikari, T.S. Lahiru, R. Sumanaruban, Puneet Dewan, Jessica G.A. Makucka, Y.S. Rawat, R. Ramanathan , Pham Nam Khanh
In this talk, the group presented standard parallel and distributed algorithms with relevant applications. Firstly the preliminaries, the ideas of Random Access Machine and Parallel Random Access Machine were presented with sufficient detail. Next, the talk shifted focus to Maximal Independent Sets and a parallel randomized algorithm due to Luby for computing MIS was presented. This was followed by a discussion on the analysis of Luby's algorithm and three important lemmas were explained to support it. The following talks covered - parallel quicksort including it's analysis, boxsort and how boxsort improved over the running time of quicksort to achieve O(log n). In the next session, the talk moved on to distributed algorithms. In this session, a randomized solution to the choice coordination problem due to Rabin was discussed. Both the synchronous and asynchronous versions of the algorithm were presented with correctness proof and complexity analysis. Synchronous and asynchronous examples for the 2 processor 2 register scenario were presented in detail. Finally, the talk concluded by summarizing the discussion and presenting a few applications of the algorithms - MIS in market graphs, wireless communication, parallel sorting in Yahoo, choice coordination algorithms in standard concurrency problems and multi-vehicle cooperative control.
‘A Random Polynomial-Time Algorithm for Approximating the Volume of Convex Bodies’, on 05-03-2014, presented by Nguyen Duy Anh Tuan, Hoo Chin Hau, Min Chen, Jingyuan Chen, Samir Kumar, Anurag Anshu , Chua Zheng Leong.
‘Graph Partitioning and Clustering for Community Detection’ on 11-03-2014, presented by Muthu Kumar C., Xie Shudong, Agus Pratondo, Aleksanr Farseev, Li Furong, Song Chonggang and Hong Hande.
Communities are sub-graphs where the vertices within the sub-graph are more densely connected than those across. In this presentation the group investigated some traditional methods on their suitability for community detection. This included graph partitioning and clustering.
One view of the problem is to partition the vertices of a graph with minimal number of edges running across the partitions or to find the min-cut of the graph. As an example, Kernighan/Lin algorithm finds two equal sized partitions with minimal cut. It uses the heuristic to swap vertices v across partitions to achieve minimal cut. It is unsuitable for community detection in large sparse graphs such as social networks.
Clustering by K-means is a faster approach linear in number of vertices but performs badly when trying to discover non-convex shaped clusters. Spectral clustering solves this problem by projecting the data into a different space and then clustering using K-means. The projection is achieved through Eigen decomposition of the Laplacian matrix of the graph.
We impress upon the need for special methods tuned to handle large sparse graphs such as social networks. Recent approaches use special metrics such as Modularity to achieve this.
‘Advanced sampling’ on 18-03-2014, presented by Mobashir Mohammad, Aditya Kulkarni, Tobias Bertelsen, Malay Singh, Hirak Sarkar, Nirandika Wanigasekara, Yamilet Serrano Llerena, Parvathy Sudhir
‘Swarm algorithms’ on 01-04-2014, presented by Abha Belorkar, Akshay Narayan, Ratul Saha, Pratik Shah, Wang Shengyi, Shweta Shinde, Shruti Tople.
In this presentation, the team introduced a class of nature inspired algorithms called Swarm Algorithms. Also known as Swarm Intelligence, these algorithms leverage the collective cognitive ability of a group of simple agents which locally communicate with each other to reach an approximately optimal solution for a given optimization problem.
The team discussed two important examples of such a framework - Ant Colony Optimization and Particle Swarm Optimization. They also presented how these two can be applied to the well known Travelling Salesman Problem and gave a demo of a simple implementation with one of them. Finally, they also presented some real-life applications of these two frameworks in Sequence Ordering Problem and Feed forward Neural Network.
'Online Algorithms' on 08-04-2014, presented by Xing Zhe, Cai Jingli, Wang Zixiao, Zhang Hao, Zhu Xiaolu, Jiao Qing, Geng Xue
In the presentation, the group talked about online algorithms. Online algorithms arise in any situation where decisions must be made online, without any knowledge of future requests. In contrast to offline algorithms which are given the whole problem data, the online algorithms must make a decision before seeing all the input data.
Different online algorithms are evaluated by competitive ratio , which gives a guarantee that for all legal input sequences, the cost of the online algorithm is within times of the cost of an offline optimal algorithm. The closer the competitive ratio is to 1, the better the online algorithm is.
Among various online problems, the group presented one representative problem: the k-Server problem to better introduce the essence of online algorithms. Firstly, they started with a simple special case, the paging problem, and showed how to generalize it to the general k-Server problem. Then, for the general k-Server problem, they presented four different online algorithms along with their detailed competitiveness analysis. The summary of these four algorithms presented is as below:
1) Greedy Algorithm: always move the nearest server to serve each request, it is simple and efficient to make a decision, but it has unbounded competitive ratio.
2) Double Coverage Algorithm: move all neighboring servers to serve the request; the competitive ratio is k in Line space and Tree space.
3) Follow-OPT Algorithm: treat the request history as the whole request sequence, and try to mimic the offline optimal algorithm, but it still has unbounded competitive ratio.
4) Work Function Algorithm: try to balance the greedy algorithm and the follow-OPT algorithm, when the total number of points in the metric space equals to k+1, it can achieve a competitive ratio of k; in more general discrete metric space, the competitive ratio is 2k-1.
‘Algorithms in Recommende
‘Combinatorial algorithms: heuristic search’ on 22-04-2014, presented by Shalinda Adikari, T.S. Lahiru, R. Sumanaruban, Puneet Dewan, Jessica G.A. Makucka, Y.S. Rawat, R. Ramanathan , Pham Nam Khanh
In this presentation the group discussed about heuristic search. They presented two search heuristics for optimization - hill climbing and genetic algorithms. They presented solutions for "uniform graph partitioning", "Steiner triple systems" and the "travelling salesman problem".
For hill climbing algorithm they discussed the basic approach and some limitations of the algorithm. Later they used hill climbing to solve the "uniform graph partitioning" problem. Then they discussed the "Steiner triple system" and proved some of its properties which are used in the heuristic solution. Using these properties they provided a hill climbing based solution for "Steiner triple systems".
Then the talk shifted focus to genetic algorithms. A generic genetic algorithm for optimization was presented with introduction to selection. mutation and crossover concepts. Later, they showed how GA based optimization is applied to the travelling salesman problem. Local search based on steepest ascent heuristics, recombination based on partially matched crossover and generation of permutations (tours) based on a combinatorial unranking method were discussed in detail.
They concluded the talk by providing some real life applications based on the two algorithms discussed. They explained how hill climbing can be useful for speech recognition systems, mobile robotics (AI). Then they moved on to explain applications of genetic algorithms in Singapore MRT (timetable synchronization), satellite truss design (NASA) & game theory (economics).
‘Streaming algoritms’, on 23-04-2014, presented by Nguyen Duy Anh Tuan, Hoo Chin Hau, Min Chen, Jingyuan Chen, Samir Kumar, Anurag Anshu , Chua Zheng Leong.