CS 5234 - Algorithms at Scale

(Fall 2019)

NUS School of Computing

Prof. Seth Gilbert

Thursday 6:30pm - 9:30pm

LT18

Course Overview

Description

The goal of this semester is to better understand how we deal with graphs and datasets that are very, very big. We will explore how to adapt classical algorithms, introducing a variety of tools for building efficient algorithms that can handle data at scale. As the semester progresses, we will encounter a variety of challenges:

Imagine, for example, that you want to find a minimum spanning tree for a very large graph. Classical solutions like Prim's and Kruskal's may be difficult to implement efficiently if your graph contains one billion nodes! Can you find a faster randomized algorithm? Can you leverage special structure (e.g., is the graph planar, or derived from a social network) to derive a faster algorithm? Can you find an approximate solution faster? What if the graph is available only as a stream? What if the graph is changing over time with edges being added and removed (perhaps as friendships are made and broken). How does cache performance impact your algorithm? Can you take advantage of multicore machines to solve the problem faster? Maybe you could use a distributed cluster (or Hadoop)?

The goal this semester to develop a set of tools you can use to adapt graph algorithms for these types of scenarios, when the graph is very large and you are no longer able to simply apply classical solutions.

Grading

Grades in the course will be based on problem sets (including a mini-project), a mid-term exam, and a final exam.

In the last month of the semester, you will have the chance to work on a small project that involves further exploring some of the ideas that we have covered during the class.

Schedule

Below is the tentative schedule for the course. I will modify the schedule as the course progresses. Problem sets will be posted below on this schedule as they become available.

Class NumberDateDescription
Sampling and Sketching.
Part 1: Graph properties in less than linear time
115/08Introduction and logistics.
Sampling.
Graph connectivity.
222/08Finding the weight of an MST using sampling
329/08Property Testing, Sorted, and Yao's Principle
  • Lecture notes (slides, pdf) [Download]
Part 2: Sketches and streams
405/09Sampling from a stream.
512/09Streaming algorithms for graphs.
  • Lecture slides (powerpoint, pdf) [Download]
  • Lecture notes (handwritten, pdf) [Download]
  • Other material:
    • Amit Chakrabarti notes on streaming algorithms for graphs: Chapter 13 (Graphs).
    • Amit Chakrabarti notes on streaming algorithms for graphs: Chapter 6 (Tug-of-War sketch).
    • Amit Chakrabarti notes on streaming algorithms for graphs: Chapter 14 (Matchings and Triangles).
    • Moses Charikar lectures on shortest paths and counting triangles. (Local copy)
    • A survey by Andrew McGregor on streaming algorithms for graphs.
    • A sequence of lectures on streaming algorithms for graphs, including the dynamic connectivity algorithm seen in class.
619/09Clustering and Streaming Algorithms for Clustering.
Break26/09No class: Mid-semester break
Efficient Algorithms for Modern Machines.
Part 3: Efficient Algorithms for Modern Machines
73/10Introduction to Caching
810/10Caches and cache-efficient algorithms (Part 2)
  • Lecture notes, cache-efficient graph algorithms (slides, pdf) [Download]
  • Sample MidTerm Exam (2018) [Download]
  • Sample MidTerm Exam (2018) solutions [Download]
  • MiniProject Instructions [Download]
  • MiniProject Proposal due. Submit here.
  • MiniProject partner finding spreadsheet. Beware: this link is currently public. Do not put any sensitive information. (I will remove the link once teams have been formed.)
917/10Mid-Term Exam
Part 4: Parallel Algorithms
1024/10Multicore Parallel Algorithms
  • Notes on parallel algorithms (pdf) [Download]
  • Lecture notes, parallel algorithms (handwritten, pdf) [Download]
  • Lecture notes, parallel algorithms (slides) (pdf) [Download]
1131/10MPC Parallel Algorithms (Part 1)
127/11MPC Parallel Algorithms (Part 2)
  • Lecture notes (slides, pdf) [Download]
1314/11MiniProject Presentations and Wrap Up
End of class
27/11 Final Exam (Afternoon)
  • Please check the official schedule to confirm the time and place.
  • Sample Final Exam (2018) [Download]
  • Sample Final Exam (2018) solutions [Download]

Miscellaneous Resources

Below you can find various resources that may help you with the material we are covering in this class.

Course Logistics

Office Hours

For now, office hours are by apointment. Once the semester begins, I'll schedule regular weekly office hours (and update the website here).

Contact

To ask me small logistical questions, I prefer that you grab my attention immediately before or after class. For substantive content questions, schedule a meeting with me.

Problem Sets

There will be weekly problem sets in the first 2/3rds of the class. (I expect this will end up being 6-7 problem sets in total.) These problems sets will focus on algorithm questions: solving problems, devising algorithms, proving theorems. There may be a few short programming assignments.

Problems will be graded on a simplistic 0/1/2/3 scale: a 0 indicates minimal understanding (or a problem not completed); a 1 indicates some understanding but many mistakes or poor presentation; 2 indicates a satisfactory answer; a 3 indicates a perfect answer that goes beyond expectations. You should expect to get a 2 on most problems.

The following rules will help keep the problem set submission and grading process running smoothly:

Some advice on problem sets:

Exams

There will be one two exams in the course: a midterm and a final. The mid-term exam will be held in class on October 17. The final exam will be held on November 27. (Please check the official schedule to confirm, as it may have changed since this webpage was created.)

Academic Integrity

I take academic integrity seriously. To repeat the problem set instructions from above: the solutions you submit must be your own unique work. Any cases of cheating will be dealt with harshly: a first offense will result in at least a one letter grade deduction for the module (or potentially a zero for the entire module, depending on the severity). When in doubt, ask me what is allowed.