Machine Learning

NUS SoC, 2019/2020, Semester I
Thursdays 12:00-14:00, Lecture Theatre 19


Note: This is not the current iteration of the course. Please visit http://www.comp.nus.edu.sg/~cs3244 for the most updated iteration.

This module introduces basic concepts and algorithms in machine learning and neural networks. The main reason for studying computational learning is to make better use of powerful computers to learn knowledge (or regularities) from the raw data. The ultimate objective is to build self-learning systems to relieve human from some of already-too-many programming tasks. At the end of the course, students are expected to be familiar with the theories and paradigms of computational learning, and capable of implementing basic learning systems.

We will be using the Coursemology Learning Management System for the administration of this course.

N.B. We will be teaching and using the Python programming language throughout this class and Jupyter Notebook via Google Colab. We will using Python 3.x

Class Structure

This class is a flipped class, a variant of a blended class. You’ll watch the first part of the video lecture before coming to tutorial, and then watch a subsequent video recorded lecture post-tutorial, to further reinforce the tutorial. There is no lecture in the lecture timing, except as noted in the syllabus: Weeks 1 (intro), 7 (midterm) and 13 (revision).

Tutorial Sessions

There will be tutorials for this class. As the class is flipped, these sessions will be the primary means by which we touch base with you and get to know you personally. Please do attend all of these sessions, as they will not be webcasted (although tutorial solutions will be distributed, you should come to the sessions to get the complete picture).

These tutorial session timings and venues are still subject to change. Please see NUSMods for the most up-to-date details. As an enrolled student, you are entitled to one tutorial placement, and need to attend that slot even if not optimal for you. Nicely, all of the tutorials slots come before (Monday-Wednesday) the class lecture slot on Thursdays.

  1. Tutorial Session 1 (Mon 13:00-14:00; COM1-0208)
  2. Tutorial Session 2 (Wed 17:00-18:00; COM1-0207)
  3. Tutorial Session 3 (Mon 15:00-16:00; COM1-0207)
  4. Tutorial Session 4 (Mon 14:00-15:00; COM1-0207)
  5. Tutorial Session 5 (Wed 11:00-12:00; COM1-0207)
  6. Tutorial Session 6 (Mon 12:00-13:00; COM1-0208)
  7. Tutorial Session 7 (Wed 16:00-17:00; COM1-0207)
  8. Tutorial Session 8 (Wed 10:00-11:00; COM1-0207)
  9. Tutorial Session 9 (Tue 16:00-17:00; COM1-0207)
  10. Tutorial Session 10 (Tue 17:00-18:00; COM1-0207)

Course Characteristics

Modular Credits: 4.

Prerequisites: (CS2010 or its equivalent) and (ST1232 or ST2131 or ST2132 or ST2334) and (MA1101R or MA1311 or MA1506) and (MA1102R or MA1505 or MA1521)

Instructors:

Teaching Assistants

Office hours are held (before and after class), but more commonly by appointment. Emails to me as a default are assumed to be public, and my replies and your anonymized email will likely be posted to Coursemology. Please let me know if you do not want the contents of your email posted; I will be happy to honor your requests.

Graduate Teaching Assistants

Undergraduate Teaching Assistants

Workload

(2-1-0-3-4)

Translation:

  • 2 lecture hours per week (flipped)
  • 1 hour of tutorials
  • 3 hours for projects, assignments, fieldwork, etc. per week
  • 4 hours for preparatory work by a student per week
Schedule

Schedule

We note that Machine Learning is a subject with a lot of very good expertise and tutorials out there. It is best to tap on these resources, as they have good production quality and are more condensed, possibly saving you time. However, we still think in-class lecture is helpful to build better connection with the materials for certain topics.

This class will be flipped; i.e., you will be asked to watch videos on YouTube explaining the concepts on your own first (the pre videos), and then after the appropriate tutorial session where staff will guide you through the pertinent exercises and reinforcement activities. Post-tutorial, you will be expected to complete the second half of the videos (the post videos) and complete a set of mastery exercises in Coursemology.

For those who find the pace of the videos too fast or needing a bit more time to digest the materials, we will offer an in-class help session during the lecture slot (i.e., Thursdays 12:00-14:00) on the remaining weeks (Weeks 2-6 and 8-12). This is completely optional (not counting against your workload), and we will not be introducing any material for the help sessions. It is just voluntary help from all of us on the staff.

DateDescriptionDeadlines
Week 1
12 Aug
Administrivia and Paradigms of Learning 15 Aug 12:00-14:00: In-class session
Week 2
19 Aug
Concept Learning
Week 3
26 Aug
Naïve Bayes and k-Nearest Neighbors
T01: Paradigms of Learning
26 Aug 23:59: Form subteams of Size 3
Week 4
2 Sep
Linear Classifiers and Logistic Regression
T02: Concept Learning, Naïve Bayes and k-Nearest Neighbors
Week 5
9 Sep
Bias and Variance and Overfitting
T03: Linear Classifiers and Logistic Regression
12 Sep 23:59: Project Proposals Due
Week 6
16 Sep
Regularization and Validation
T04: Bias, Variance and Overfitting
19 Sep 23:59: Peer Grading of Project Proposals Due
Recess Week
23 Sep
Week 7
30 Sep
Midterm and Guest Lectures 3 Oct 12:00-14:00: In-class midterm and guest lectures
Week 8
7 Oct
Neural Networks
T05: Regularization and Validation
Interim Project Consultations with Staff
Week 9
14 Oct
Deep Learning
T06: Neural Nets
Interim Project Consultations with Staff
Week 10
21 Oct
Decision Trees and Ensembles
T07: Deep Learning
STePS teams: 2nd Project Consultations with Staff
Week 11
28 Oct
Unsupervised ML: k Means and Expectation Maximization
T08: Decision Trees and Ensembles
STePS teams: 2nd Project Consultations with Staff
Non-STePS teams: 31 Oct 23:59: Project Posters and Videos Due
Week 12
4 Nov
Reinforcement ML
T09: Unsupervised Learning
Non-STePS teams: Poster Presentations to Staff
Non-STePS teams: 7 Nov 23:59: Peer grading of the Non-STePS Project Posters and Videos Due
Week 13
11 Nov
Machine Learning Ethics
T10: Reinforcement ML
STePS teams: 13 Nov 18:00: Project Posters and Videos Due
STePS teams: 13 Nov 18:00-22:00: Participation on evening of 15th STePS
14 Nov 12:00-14:00: In-class session Flipped classroom as per normal.
STePS teams: 14 Nov 23:59: Peer grading of peer STePS Project Posters and Videos Due
15 Nov 23:59: Project Reports Due
Exam Week
25 Nov
Final Assessment (13:00-15:00)


You can import the below calendar via its URL https://calendar.google.com/calendar?cid=MTFnY205bm5pNjJxcDIwcWhqOTVpOHFuNHNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ.

Grading

Grading

The grading for this class will comprise of the following continuous assessment milestones and a final assessment. The final exam will be open book, but the midterm is closed book. Both the midterm and final assessment will largely be MCQ/MRQ based.

DescriptionPercentage
Midterm (3 Oct 2019, in-class)20%
Machine Learning Project30%
Weekly Python Notebook Assessments5%
Participation5%
Final Assessment (25 Nov 2019, 13:00-15:00)40%
Total100%


This module is a flipped module, meaning that most lectures (with the exception of Weeks 1, 7 and 13) are online and done at your own pace through watching the appropriate pre-recorded videos. For Weeks 1, 7 and 13, you must be in class during the classroom hours. Staff contact will be mostly conducted through Tutorials and Project Consultations (to be arranged individually per project group).

Lecture and tutorial attendance is not mandatory, but will help with your participation grade. Participation is very helpful for your teaching staff too. Without it, we have very little idea whether you understand the material that we’ve presented or whether it’s too difficult or trivial. Giving feedback in the form of questions, discussion provides us with a better idea of what topics you enjoy and which you are not too keen on.

Academic Honesty Policy

Please note that we enforce these policies vigorously. While we hate wasting time with these problems, we have to be fair to everyone in the class, and as such, you are advised to pay attention to these rules and follow them strictly.

Collaboration is a very good thing. Students are encouraged to work together and to teach each other. On the other hand, cheating is considered a very serious offense. Please don’t do it! Concern about cheating creates an unpleasant environment for everyone. You will be automatically reported to the vice-dean of academic affairs if you are caught, no exceptions will be made for any infractions no matter how slight the offense.

So how do you draw the line between collaboration and cheating? Here’s a reasonable set of ground-rules. Failure to understand and follow these rules will constitute cheating, and will be dealt with as per University guidelines. We will be enforcing the policy vigorously and strictly.

You should already be familiar with the University’s honor code. If you haven’t yet, read it now.

The Pokémon Go Rule: This rule says that you are free to meet with fellow students(s) and discuss assignments with them. Writing on a board or shared piece of paper is acceptable during the meeting; however, you may not take any written (electronic or otherwise) record away from the meeting. This applies when the assignment is supposed to be an individual effort. After the meeting, engage in a half hour of mind-numbing activity (like catching up with your friends and family’s activities on Facebook, before starting to work on the assignment. This will assure that you are able to reconstruct what you learned from the meeting, by yourself, using your own brain. The Freedom of Information Rule: To assure that all collaboration is on the level, you must always write the name(s) of your collaborators on your assignment. Failure to adequately acknowledge your contributors is at best a lapse of professional etiquette, and at worst it is plagiarism. Plagiarism is a form of cheating.

The No-Sponge Rule: In intra-team collaboration where the group, as a whole, produces a single “product”, each member of the team must actively contribute. Members of the group have the responsibility (1) to not tolerate anyone who is putting forth no effort (being a sponge) and (2) to not let anyone who is making a good faith effort “fall through a crack” (to help weaker team members come up to speed so they can contribute). We want to know about dysfunctional group situations as early as possible. To encourage everyone to participate fully, we make sure that every student is given an opportunity to explain and justify their group’s approach.

This section on academic honesty is adapted from Surendar Chandra’s course at the University of Georgia, who in turn acknowledges Prof. Carla Ellis and Prof. Amin Vahdat at Duke University for his policy formulation. The origin of the original rule, called the Gilligan’s Island rule, is uncertain, but at least can be traced back to Prof. Dymond at York University’s use of it in 1984.

Late Submissions

All homework assignments are due to Coursemology by 23:59:59 (Singapore time) on the due date. No exceptions without a medical certificate will be made. The following penalties will apply for late submissions:

  • late within 1 hour: 10% reduction in grade;
  • late within 5 hours: 30% reduction in grade;
  • late within 1 day: 50% reduction in grade;
  • late within 5 days: 70% reduction in grade;
  • after 5 days: zero mark.

These penalties are intentionally set severe to encourage students to turn in assignments on time. This in turns, means that your teaching staff can start and finish grading within a certain time period, and can help you get timely feedback on your work. Do not expect any type of preferential treatment if you turn in an assignment late.

Assignment return policy and regrades

All students have a right to question the grading of their work. If a regrade is sought for a particular milestone, this must be brought to our attention within 3 days of the return of the preliminary grades by email. Requests later than that will not be entertained without certified medical leave or school permission.

Projects

Student Projects

Credits: Much of the architecture for this course project comes from Bryan Low (NUS) and Thorsten Joachims (Cornell)

A key part of the mastery of machine learning is practicing it, outside of the formal mathematical and statistical basis for the algorithms. The student projects form an integral part of the assessment. Student teams should have 5-6 members and will be assembled by the teaching staff. There are two kinds of projects that can be done: Self-Defined Projects and Kaggle Competition Projects. Choose only one of the two.

Self-Defined Projects

The final project is intended to be a limited investigation in an area of machine learning of your choice. The purpose of the project is to enable you to study an area of your interest in greater detail in a practical way. The project can take on many forms, including but not limited to:

  1. Projects that explore the application of machine learning ideas to an interesting “real-world” problem.
  2. Projects that involve a theoretical or empirical study of aspects of a learning method or model.
  3. Projects that do an experimental, comparative study of various machine learning methods.
  4. Projects that extend or synergise with an existing project (could be from a member of your group), such as a honors year project. Caution: this type project may lead to unequal contribution, due to members’ prior expertise.

Doing such a project gives you more flexibility and allows you work on something of your liking. However at the same time, this may potentially require some additional effort (depending on your problem) such as data collection or coming up with suitable baselines, and/or explicitly declaring what is being extended or novel for the scope proposed for the class. The teaching staff will take these factors into account when grading.

Kaggle Competition Projects

On the Kaggle website, you can find and choose from a number of interesting machine learning competitions. Upon joining a competition, you will be provided with a training and testing sets, and your performance will be measured with specified metrics and ranked with other competitors on the web.

Note that performance on the different metrics is not the critical factor in your grade on the project. While doing well on the competition helps, we primarily evaluate with respect to the (interesting) ideas your team employs to solve the task. While the data is easier to obtain for such a project, there is less flexibility and more emphasis on coming up with interesting methods.

Project Structure

DescriptionPercentage
Proposal3%
Interim Presentation5%
Project Poster, Video and Poster Presentation5 + 5 + 5 = 15%
Final Project Report7%
Peer Grading (of three other projects) (counts towards Participation)
Total30%


You will need to form teams and propose a topic to for your project in a formal project proposal. The staff and your peers will give you feedback. After the project proposal, you will be assigned a contact TA that you can use as a resource for questions and advice.

In Weeks 8-9, the staff will meet with all teams for the only mandatory consultation. Your team will need to prepare a short presentation deck to present to the staff about the progress of your project. Based on your team’s interest and the staff’s opinion on your project, you may elect to present at 15th STePS. Teams that present at STePS will have a slightly different path and grading rubric for their project which involves additional consultation with staff and cumulates with a public presentation at STePS.

Detailed grading rubrics for all phases of the project are provided as part of the project reporting templates provided in class, as well as through the accompanying project videos that are released on Week 1. The general grading metrics are as follows:

  1. Originality
  2. Relevance to course
  3. Quality of arguments (are claims supported, how convincing are the arguments you bring forward)
  4. Clarity (how clearly are goals and achievements presented)
  5. Scope/Size (in proportion to size of group)
  6. Significance (are the questions you are asking interesting)

Like supervised machine learning, sometimes it’s easier to learn from data than from rubrics. You can get a look at past projects by looking the previous iteration of CS3244 (i.e., Semester I, AY 18/19) and finding the past projects on the webpage (scroll down to the Projects part).