**CS6234 :
Advanced Algorithms**

**Basic
Information:**

Modular
credits: 4

This module is aimed at graduate students who are doing or intend to do advanced research in algorithm design and analysis in all areas of computer science. The module covers advanced material on algorithms with emphasis on efficient algorithms, and explores their use in a variety of application areas. Topics covered include, but are not restricted to, linear programming, graph matching and network flows, approximation algorithms, randomized algorithms, online algorithms, local search algorithms, algorithms for large datasets. The module will be a seminar-based module that will expose students to current research in these areas.

Lectures: Every Wednesday 9-11 pm at COM1-02-13 (VC Room).
First lecture on 13^{th} January, 2015.

There will be no tutorials.

Office Hours: Every Wednesday 3-4 pm (or by appointment).

Name : Rahul Jain

Office : S15-04-01

Email: first name at comp.nus.edu.sg

Phone: 65168826 (off).

We may use parts of the following texts and also material from outside these, e.g. research papers on algorithms and related areas.

1. “Introduction to Algorithms” by Cormen, Leiserson, Rivest and Stein (Prentice Hall, Second Edition.)

2. “Algorithm Design” by Jon Kleinberg and Eva Tardos. (Pearson International Edition.)

3. “Approximation Algorithms” by Vijay. V. Vazirani. (Springer.)

4. “Randomized Algorithms” by Rajeev Motwani and Prabhakar Raghavan. (Cambridge University Press.)

5. “Computational Geometry”, by de Berg, Mark, O. Cheong, M. van Kreveld, and M. Overmars. (Third edition. New York, NY: Springer-Verlag, 2008. ISBN: 9783540779735).

6. “Data streams: Algorithms and applications”, by S. Muthukrishnan. Appears in Foundations and Trends in Theoretical Computer Science, Volume 1, issue 2, 2005.

Somebody famously said “A problem well understood is half solved.” Therefore a primary ability any researcher must develop is the ability to read and understand and absorb known body of knowledge before adding further on to it. Since this module is intended at students who wish to do research in algorithms and related areas, the primary focus of the module will not be on ‘teaching new topics in algorithms’ (this has been done in several previous courses that you have taken in your journey so far), but on ‘teaching how to learn new topics in algorithms’. Therefore the focus will be to train students in being able to read and understand advanced material. You will be expected to read, understand and present advanced topics in algorithms/ research papers in algorithms. Your grade will be determined on how well you understand and present these topics/papers. There will be no exams.

First presentation: 40 marks.

Second presentation: 40 marks.

Forum discussions: 20 marks.

`Convex Optimization’ presented on 20-01-2016 by Philipp Keck,
Hu Sixing, Hakki Can Karaimer,
Pan An

An optimization problem is called convex if its objective function is convex and all constraints form a convex feasible region. Many important problems like linear regression and neural network training are already convex or can be transformed to be convex without sacrificing the utility of the results.

In this presentation, the group discussed algorithms to solve unconstrained convex optimization problems. Gradient Descent and Steepest Descent are two related, iterative algorithms to find the minimum of a convex function and both can be combined with either exact line search or backtracking line search. The group introduced the algorithms and explained them using mathematical examples. Then they evaluated and compared the algorithms. The convergence speed depends on the step size and on the condition number of the objective function. For steepest descent, it is sometimes possible to transform the coordinates of the problem to account for high condition numbers, which is the main advantage of steepest descent. The main advantage of gradient descent, on the other hand, is its simplicity.

In the final session of the talk, the group highlighted real-world examples from machine learning, computer vision and robotics.

`Parallel and Distributed Algorithms’ presented on 27-01-2016 by ABDELHAK BENTALEB, LEI YIFAN, JI XIN,
DILEEPA FERNANDO and ABDELRAHMAN KAMEL

The group presented parallel and distributed algorithms for finding maximal independent set (MIS), for finding perfect matching in graphs and for solving byzantine agreement problem in parallel and distributed systems. They started by giving a quick background on parallel and distributed processing and graph theory. Then they discussed an algorithm to find maximal independent sets in graphs, in both sequential and parallel forms. They provided a detailed proof of its correctness and complexity through a discussion on the module’s forum (please also consider the notes below). They proceeded to present a parallel algorithm for finding perfect matchings in graphs. They introduced the Tutte matrix which can be used to determine the existence of perfect matchings in graphs. Then they presented the algorithm which uses the value of the determinant of the Tutte matrix to determine if an edge belongs to the minimum weight perfect matching.

After the perfect matching algorithm, they introduced the byzantine agreement problem, in which a number of processors try to reach an agreement on the same decision despite the existence of some faulty processor in the distributed system; these faulty processors try to subvert the agreement. They discussed the algorithm using three examples for three different cases of the problem. Then they discussed the correctness of the algorithm through the examples which covered all the possible inputs, and showed that it has constant expected running time.

Slides , Notes on Parallel MIS

`Approximation Algorithms’ presented on 3-02-2016 by Guo QI, Chen Zhenghai, Wang Guanhua, Shen Shiqi and Himeshi De Silva

In this presentation, the group talked about approximation algorithms. An approximation algorithm is a way of dealing with NP-completeness for optimization problem. To introduce these algorithms, the group mainly focused on 2 problems: the Steiner Tree Problem (STP) and Traveling Salesman Problem (TSP).

For STP, there are 3 variants due to the different restriction of weight. But since there is an approximation factor preserving reduction from the general Steiner Tree Problem to the Metric Steiner Tree Problem, the presentation mainly focused on Metric STP. Then, they recalled the minimum spanning tree (MST) problem and Kruskal's algorithm to find the MST. Because STP is an NP-hard problem, MST-based approximation algorithm, which is proved to be 2- approximation algorithm, is used to find the approximate solution for STP.

For TSP, since there is no constant factor approximation algorithm for general TSP, they only paid attention to the Metric TSP. The group introduced 2 different approximation algorithms to solve it. The first one is a 2-approximation algorithm, and the second one is Christofide's algorithm which is proved to be 3/2-approximation algorithm. Both algorithms are shown clearly with examples, and detailed proofs are provided.

Finally, the group showed the applications of TSP, like the production of Printed Circuit Boards (PCBs) and design of tour route. There is also an interesting app called ConcordeTSP which can be found in ios app store.

`Streaming Algorithms’ presented on 11-02-2016 by Dmitrii Kharkovskii, Naheed Anjum Arafat, Erick Purwanto, TaeHoon Joseph, Kim Dissanayaka Mudiyanselage Emil Manupa Karunaratne and Sapumal Ahangama

The group discussed solving problems on streaming algorithms. For counting bits problem, DGIM algorithm was introduced. The DGIM algorithm can solve it in O(log^2 n) space. The next problem presented was the set membership problem using Bloom filter. Bloom filter is a probabilistic data structure for storing the sketch of the elements of a set in the form of bit vector. They discussed the traditional Bloom filter, answering set membership queries with it and its pros and cons in the presentation. The group then introduced Count-Min Sketch data structure which can solve the approximate frequency query problem in sublinear space. Then Heavy Hitters problem was introduced. The goal is to find elements that occurred more than certain number of times. They used the Count-Min Sketch and Min-Heap to solve the Approximate Heavy Hitters problem efficiently. After that AMS Sketch, which is an enhancement of the Count-Min Sketch for estimating the Second Moment using sublinear space, was presented. The presentation concluded with the demonstration of application of AMS Sketch for finding an approximate estimate for inner product.

`Randomized Algorithms’ presented on 19-02-2016 by Hung Dang, Zheyuan Gao, Irvan Jahja, Loi Luu, Divya Sivasankaran

In this presentation, the group addressed the
problem of approximating the permanent of a matrix, which is equivalent to
approximating the number of perfect matchings in a corresponding bipartite
graph. They first discussed a popular approach in approximate counting
which uses the Monte Carlo method. Then
they explained why this method does not work for any bipartite graph but for
only ones whose edges have minimum degree of n/2. The algorithm works by
iteratively approximating the number of perfect matchings with k edges, using
the Monte Carlo method, which relies on the possibility of randomly sampling a
matching, which cannot be achieved in polynomial time. Thus, they discuss a
near uniform matching generator which trades off between the efficiency and
precision, i.e., sample a matching at near uniformly random in polynomial time.
They build this near uniform matching generator by devising a Markov Chain
model with states representing the possible matchings in the graph. This Markov
Chain exhibits a property that after a sufficiently long (but polynomial of n)
random walk, it will end up at a uniformly random state. They then show the
polynomial upper bound on the sampling using the canonical path argument.

`Online Algorithms in Machine Learning’ on 24-02-2016 by WALEED ABDULWAHAB YAHYA AL-GOBI, MUHAMMAD BURHAN
HAFEZ, KIM HYEONGCHEOL, HE RUIDAN, SHANG XINDI

Machine learning (ML) algorithms can be classified according to the nature of data to learn from, into either offline or online learning. Offline learning is a two-phase process that consists of learning phase and testing phase. In the learning phase, the algorithm is trained on a pre-defined static dataset of examples and constructs a hypothesis that would be used to find accurate conclusion for a given new data in the testing phase. The point here is that the learning phase is completely separated from the testing phase. As opposite to offline learning that learns the hypothesis on the entire training set at once, online learning algorithm is a common technique that can be used in ML areas where it is computationally infeasible to train on the entire dataset all at once or when most training examples are not available ahead of time. So online learning is a method of ML in which data becomes available in a sequential order, and is used to update the hypothesis at each step of the learning.

Predicting from Expert Advice is a particularly interesting problem from the point of view of online learning. In this problem, the tasks are given sequentially to the learning algorithm to make prediction according to the advices from the experts about these tasks. Weighted majority algorithm (WMA) is a learning algorithm for predicting from expert advice, which tries to make predictions with the fewest number of mistakes that is close enough to that of the best expert. Two versions of WMA have been explained. The first one is the simple version, which makes a prediction for each received task with specific bound of mistakes, while the so-called randomized version is a better version that also makes predictions for each received task but with bound of mistakes less than that of the simple version. Moreover, the randomized version can also be used for prediction where the advices are naturally different from one another, not only restricted to binary advices.

With respect to online learning algorithms, a mistake bound model is used as basis or framework for specific online learning algorithms that aim to find the target concept with few mistakes as possible. Two specific learning problems were presented. One is the learning of monotone disjunctions; another is the learning of decision list. Simple and ‘Winnow’ algorithms are explained for learning monotone disjunction target concept. Winnow algorithm is a very elegant and practical approach which is particularly useful in conditions where the number of relevant variables in target concept is much smaller than the number of total variables. Decision list is more expressive compared with monotone disjunctions. A standard algorithm was given to learn decision list under mistake bound model.

`Min-Cut Algorithms’ presented on 2-03-2016 by Philipp Keck,
Hu Sixing, Hakki Can Karaimer,
Pan An, TaeHoon Kim

The group first presented a "Random Contraction Algorithm" by David Karger, finding a minimum cut in a graph, which is a huge improvement in complexity over naive algorithms. It contracts randomly selected edges until only two super-nodes are left. Examples were presented to visualize this procedure and the resulting cuts. Being a randomized algorithm, it may or may not output a minimum cut. To boost the success probability up to 1-O(1/n^c), the algorithm is repeated O(n^2 log n) times, resulting in a total runtime of O(n^4 log n). The group then presented an improvement of this algorithm devised by Karger and Stein. Observing that errors are more likely to be made during the later stages of the algorithm, the improvement is based on a recursive procedure that processes the former portion of all edges only once, then recurses twice on the latter portion and uses the better of the two results. The runtime is improved to O(n^2 log n), while maintaining an overall error probability of O(1/n) when repeated log^2 n times.

For parallelism of Karger’s algorithm, compact method was introduced by interpreting one run of Karger’s contraction algorithm as a permuted sequence of edge-insertions into an empty graph (only the nodes exist initially). The minimum-cut problem can be considered as finding a prefix of the sequence which is able to connect all nodes as exactly 2 connected components. Compact method itself takes O(m log m) time. By parallelizing contraction and connected component detection, time complexity can reach O(log^2 n). Therefore, the minimum cut problem is in RNC (Randomized Nick’s Class).

Lastly, the group presented several applications of Karger’s algorithm. Karger’s algorithm removes the minimum number of edges to split a graph; in other words, it minimizes the edge information loss. Based on this idea, Karger’s algorithm can be used to split large graphs and to find communities on social media. Furthermore, Karger’s algorithm can be used to detect which edges of the graph are holding the whole graph together. Based on this idea, the algorithm can be used to find weak ties on social media and detecting vulnerable part of a graph.

`L_{0} gradient minimization’ presented on 9-03-2016 by ABDELHAK BENTALEB, LEI YIFAN, JI XIN,
DILEEPA FERNANDO and ABDELRAHMAN KAMEL

In this presentation, the group addressed the problem of L_{0}
gradient minimization. L_{0} gradient minimization aims at minimizing
the number of non-zero gradients in some discrete input function to generate
another discrete output function that has less number of non-zero gradients
while maintaining the similarity between the input and the output functions as
possible. They started by giving some introduction to L_{0} norm, L_{0}
gradient and L_{0} gradient minimization. They discussed why L_{0}
minimization is NP-hard problem and provided a quick overview of alternating
optimization framework that can be applied to solve this problem. Then, they
presented an algorithm for L_{0} gradient minimization that depends on
coordinate descent optimization framework. This algorithm tries to minimize the
L_{0} gradient locally for each point (coordinate) and this local minimization
is controlled by some auxiliary parameters. After that, they presented another
algorithm for L_{0} gradient minimization that is based on alternating
optimization framework and uses a region fusion criterion for achieving faster
convergence. This algorithm is faster in convergence because it introduces this
fusion criterion where it tries to minimize the L_{0} gradient locally
for groups of points, instead of individual points, in each iteration. Then,
they presented a proof of the correctness of the algorithm, which they have
done by themselves, based on some claims. This proof was not introduced in the
corresponding literature. In this proof, they calculated an upper bound for the
objective function that was proven to be non-increasing.

Finally, they presented a demo about how the algorithm works. They also presented some experimental results comparing the two algorithms. They obtained these results by running the codes of the algorithms which are available online. They also showed the applicability of the algorithms to a set of applications.

`Convex Hulls’ presented on 16-03-2016 by Guo QI, Chen Zhenghai, Wang Guanhua, Shen Shiqi and Himeshi De Silva

In this presentation, the group presented the problem of
finding the convex hull of a finite set of points in the 2D plane. As the most
ubiquitous structure in computational geometry, convex hull is useful in its
own right and useful as a tool for constructing other structure in a wide
variety of circumstances.

The presentation started with basic definitions, such as
‘polar angle’, ‘left or non-left turn’, ‘convexity’. Moreover, a naïve but
high-time-complexity method was introduced. Then, three efficient algorithms
were covered: Graham’s Scan, Jarvis March and Chan’s Algorithm.

The three algorithms were introduced clearly with their
own examples. The group also discussed the time complexities of the algorithms.
In detail, Graham’s Scan has complexity O(n*log n),
Jarvis March has complexity O(n*h) and Chan’s Algorithm has complexity O(n*log
h), where n is the number of input points, and h is the number of points on the
convex hull. It is clear that both Jarvis March and Chan’s Algorithm are
output-sensitive algorithms, i.e. their complexities depend on the output.

At the end, the group showed the proofs of the correctness of the three algorithms and some applications of finding convex hulls.

`Random Algorithms for 3SAT’ presented on 23-03-2016 by Dmitrii Kharkovskii, Naheed Anjum Arafat, Erick Purwanto, Dissanayaka Mudiyanselage Emil Manupa Karunaratne and Sapumal Ahangama

An NP-complete problem can be solved by exponential time algorithms. However, this does not mean that there is no better algorithm than the brute force search. The group demonstrated this by presenting the random search algorithm for solving the 3SAT problem. After reviewing the 3SAT problem, its hardness and applications, and naively solving it with the brute force algorithm, the group introduced ‘Schoning’s algorithm’. They showed that the algorithm always gives the correct solution when the 3SAT instance is unsatisfiable. They also showed the cases that could occur when the 3SAT instance is satisfiable.

The group proceeded by giving three analyses for the case where the 3SAT instance is satisfiable, each giving a better bound than the previous one to get inverse polynomial error probability. The first analysis manages to give Theta(1.74^n) outer iterations of the Schoning’s algorithm by considering roughly half of the initial random assignments as the possible success trials, but allowing no mistakes for the algorithm to select the variable to flip. The second analysis gives better Theta(1.5^n) by altering the analysis to consider any possible initial random assignments. The final analysis gives Theta(1.33^n) outer loop iterations by slightly adding the inner loop iterations, allowing the algorithm to make some mistakes in choosing the variables to flip, and leveraging the ‘Stirling bound’. In conclusion, the group presented a random search algorithm with running time O(1.33^n 3n sqrt(n) log(n)) to solve the 3SAT problem.

`Practical Byzantine Fault Tolerance’ presented on 30-03-2016 by Hung Dang, Zheyuan Gao, Irvan Jahja, Loi Luu, Divya Sivasankaran

The group presented a practical ‘Byzantine Fault Tolerance (BFT)’ algorithm which solves the ‘fault tolerance’ problem in the ‘asynchronous’ and ‘byzantine’ setting. The group first introduced the problem, the model and all the properties that one has to achieve in order to solve BFT. While previous works in the literature either assume synchrony in their solution, or only tolerate non-byzantine setting (failed-stop/crash-only), the algorithm that the group presented is practical in many senses. First, it works when the network is asynchronous and can tolerate byzantine failures. Second it introduces many optimizations to improve the performance by orders of magnitude than previous works. The group then presented how the algorithm works, how it is implemented and evaluated. While many things are abstracted, the group focused on the underlying insights to provide a better understanding about the algorithm.

‘Analysis of Large Graphs Group for community detection’ on 6-04-2016 by WALEED ABDULWAHAB YAHYA AL-GOBI, MUHAMMAD BURHAN
HAFEZ, KIM HYEONGCHEOL, HE RUIDAN, SHANG XINDI

In this presentation, the group introduced the concept of analysis of undirected large graph for community detection. Currently it has become one of popular techniques according to the advancement of social network services. The technique is also useful for image segmentation, genetic analysis and so on. At the beginning, the group presented why the analysis of undirected large graph would be important and discussed the basic principles for graph partitioning used for the analysis. They explained two basic approaches for graph partition, minimum-cut and normalized-cut with examples. After explaining basic principles, they expanded their discussion to two categories: non-overlapping community detection and overlapping community detection.

First, they discussed how to determine the balanced optimal cutting point from the non-overlapping large graph, which is the ‘spectral clustering’ approach. The algorithm consists of three steps to find out the optimal cutting point and we can obtain the point by using the ‘Laplacian matrix’ and matrix eigenvalue decomposition. In addition, they suggested the deep graph encoder, which is able to apply to the analysis of undirected large graph in a different way. The graph encoder minimizes the reconstruction loss of the graph Laplacian matrix that is the auto-encoder training set and efficiently learns a low-dimensional graph embedding represented by k eigenvectors where k is the number of the auto-encoder hidden layer units. We can apply it to find graph partitions by running k-means on the leaned embedding.

Second, in the overlapping community detection, they
discussed BIGCLAM (Cluster Affiliation Model for Big Networks), an overlapping
community detection method. They described the algorithm as a model-based
community detection algorithm that can detect densely overlapping,
hierarchically nested as well as non-overlapping communities in massive
networks. In the end of the presentation, they explained how to optimize the value
of F, which is a parameter of the model and suggested additional reading
materials.