Machine Learning

NUS SoC, 2018/2019, Semester I
Thursdays 12:00-14:00, Lecture Theatre 19


This module introduces basic concepts and algorithms in machine learning and neural networks. The main reason for studying computational learning is to make better use of powerful computers to learn knowledge (or regularities) from the raw data. The ultimate objective is to build self-learning systems to relieve human from some of already-too-many programming tasks. At the end of the course, students are expected to be familiar with the theories and paradigms of computational learning, and capable of implementing basic learning systems.

N.B. We will be teaching and using the Python programming language throughout this class and Jupyter Notebook via Google Colab. We will using Python 3.x

Course Characteristics

Modular Credits: 4.

Prerequisites: (CS2010 or its equivalent) and (ST1232 or ST2131 or ST2132 or ST2334) and (MA1101R or MA1311 or MA1506) and (MA1102R or MA1505 or MA1521)

Instructors:

Teaching Assistants

Office hours are held (before and after class), but more commonly by appointment. Emails to me as a default are assumed to be public, and my replies and your anonymized email will likely be posted to IVLE. Please let me know if you do not want the contents of your email posted; I will be happy to honor your requests.

Workload

(2-1-0-3-3)

Translation:

  • 2 lecture hours per week (flipped)
  • 1 hour of tutorials (in class)
  • 3 hours for projects, assignments, fieldwork, etc. per week
  • 3 hours for preparatory work by a student per week

Class Structure

This class is a flipped class, a variant of a blended class. Typically, you’ll watch a first part of the lecture before coming to class, attending the physical in-class session (on Thursday afternoons) as a single-section “tutorial”, and then watch a subsequent video recorded lecture post-class, to further reinforce the lesson introduced in the introductory video, and in the physical in-class session.

Tutorials: There will be no tutorials for this class. As the class is flipped, the in-class lecture session will be used to cover exercises and provide “tutorial-like” reinforcement of concepts introduced in the videos. We will be using the tutorial slots for conduct consultation about student projects and assignments.

Schedule

Schedule

We note that Machine Learning is a subject with a lot of very good expertise and tutorials out there. It is best to tap on these resources, as they have good production quality and are more condensed, possibly saving you time. However, we still think in-class lecture is helpful to build better connection with the materials for certain topics.

This class will be flipped; i.e., you will be asked to watch videos on an unlisted YouTube channel explaining the concepts on your own first, and then come to class for a class-wide recitation, in which the teaching staff will guide you through pertinent exercises and reinforcement activities.

DateDescriptionDeadlines
Week 1
16 Aug
Administrivia and Paradigms of Learning
Week 2
23 Aug
Naïve Bayes and k-Nearest Neighbors
Week 3
30 Aug
Linear Classifiers
Week 4
6 Sep
Logistic Regression
Week 5
13 Sep
Bias and Variance and Overfitting Project Proposals
Week 6
20 Sep
Regularization and Validation Peer Grading of Project Proposals
Recess Week
27 Sep
Week 7
4 Oct
Midterm
Week 8
11 Oct
Neural Networks and Backpropagation Interim Reports
Week 9
18 Oct
Deep Learning Peer Grading of the Interim Reports
Week 10
25 Oct
Support Vector Machines
Week 11
1 Nov
Decision Trees and Ensembles Project Posters
Week 12
8 Nov
k Means and Expectation Maximization Project Reports
Peer grading of the Project Posters
Week 13
15 Nov
Machine Learning Ethics Participation on evening of 13th STePS
Peer grading of the Project Reports
Exam Week
26 Nov
Final Assessment (Afternoon)
Grading

Grading

The grading for this class will comprise of the following continuous assessment milestones and a final exam. The final exam will be open book.

DescriptionPercentage
Midterm (1 Oct 2018, in-class)20%
Machine Learning Project30%
Weekly Python Notebook Assessments5%
Participation5%
Final Exam (26 Nov 2018, afternoon)40%
Total100%

Attendance is not mandatory, but will help with your participation grade. Participation is very helpful for your teaching staff too. Without it, we have very little idea whether you understand the material that we’ve presented or whether it’s too difficult or trivial. Giving feedback in the form of questions, discussion provides us with a better idea of what topics you enjoy and which you are not too keen on.

Academic Honesty Policy

Please note that we enforce these policies vigorously. While we hate wasting time with these problems, we have to be fair to everyone in the class, and as such, you are advised to pay attention to these rules and follow them strictly.

Collaboration is a very good thing. Students are encouraged to work together and to teach each other. On the other hand, cheating is considered a very serious offense. Please don’t do it! Concern about cheating creates an unpleasant environment for everyone. You will be automatically reported to the vice-dean of academic affairs if you are caught, no exceptions will be made for any infractions no matter how slight the offense.

So how do you draw the line between collaboration and cheating? Here’s a reasonable set of ground-rules. Failure to understand and follow these rules will constitute cheating, and will be dealt with as per University guidelines. We will be enforcing the policy vigorously and strictly.

You should already be familiar with the University’s honor code. If you haven’t yet, read it now.

The Pokémon Go Rule: This rule says that you are free to meet with fellow students(s) and discuss assignments with them. Writing on a board or shared piece of paper is acceptable during the meeting; however, you may not take any written (electronic or otherwise) record away from the meeting. This applies when the assignment is supposed to be an individual effort. After the meeting, engage in a half hour of mind-numbing activity (like catching up with your friends and family’s activities on Facebook, before starting to work on the assignment. This will assure that you are able to reconstruct what you learned from the meeting, by yourself, using your own brain. The Freedom of Information Rule: To assure that all collaboration is on the level, you must always write the name(s) of your collaborators on your assignment. Failure to adequately acknowledge your contributors is at best a lapse of professional etiquette, and at worst it is plagiarism. Plagiarism is a form of cheating.

The No-Sponge Rule: In intra-team collaboration where the group, as a whole, produces a single “product”, each member of the team must actively contribute. Members of the group have the responsibility (1) to not tolerate anyone who is putting forth no effort (being a sponge) and (2) to not let anyone who is making a good faith effort “fall through a crack” (to help weaker team members come up to speed so they can contribute). We want to know about dysfunctional group situations as early as possible. To encourage everyone to participate fully, we make sure that every student is given an opportunity to explain and justify their group’s approach.

This section on academic honesty is adapted from Surendar Chandra’s course at the University of Georgia, who in turn acknowledges Prof. Carla Ellis and Prof. Amin Vahdat at Duke University for his policy formulation. The origin of the original rule, called the Gilligan’s Island rule, is uncertain, but at least can be traced back to Prof. Dymond at York University’s use of it in 1984.

Late Submissions

All homework assignments are due to IVLE by 11:59:59 pm (Singapore time) on the due date. No exceptions without a medical certificate will be made. The following penalties will apply for late submissions:

  • late within 1 hour: 10% reduction in grade;
  • late within 5 hours: 30% reduction in grade;
  • late within 1 day: 50% reduction in grade;
  • late within 5 days: 70% reduction in grade;
  • after 5 days: zero mark.

These penalties are intentionally set severe to encourage students to turn in assignments on time. This in turns, means that your teaching staff can start and finish grading within a certain time period, and can help you get timely feedback on your work. Do not expect any type of preferential treatment if you turn in an assignment late.

Assignment return policy and regrades

Failure is success if we learn from it. Malcolm Forbes

All students have a right to question the grading of their work. If a regrade is sought for a particular milestone, this must be brought to our attention within 3 days of the return of the preliminary grades by email. Requests later than that will not be entertained without certified medical leave or school permission.

Projects

Student Projects

Credits: Much of the architecture for this course project comes from Bryan Low (NUS) and Thorsten Joachims (Cornell)

A key part of the mastery of machine learning is practicing it, outside of the formal mathematical and statistical basis for the algorithms. The student projects form an integral part of the assessment. Student teams should have 5-6 members and will be assembled by the teaching staff. There are two kinds of projects that can be done: Self-Defined Projects and Kaggle Competition Projects. Choose only one of the two.

Self-Defined Projects

The final project is intended to be a limited investigation in an area of machine learning of your choice. The purpose of the project is to enable you to study an area of your interest in greater detail in a practical way. The project can take on many forms, including but not limited to:

  1. Projects that explore the application of machine learning ideas to an interesting “real-world” problem.
  2. Projects that involve a theoretical or empirical study of aspects of a learning method or model.
  3. Projects that do an experimental, comparative study of various machine learning methods.
  4. Projects that extend or synergise with an existing project (could be from a member of your group), such as a honors year project.

Doing such a project gives you more flexibility and allows you work on something of your liking. However at the same time, this may potentially require some additional effort (depending on your problem) such as data collection or coming up with suitable baselines, and/or explicitly declaring what is being extended or novel for the scope proposed for the class. The teaching staff will take these factors into account when grading.

Kaggle Competition Projects

On the Kaggle website, you can find and choose from a number of interesting machine learning competitions. Upon joining a competition, you will be provided with a training and testing sets, and your performance will be measured with specified metrics and ranked with other competitors on the web.

Note that performance on the different metrics is not the critical factor in you grade on the project. While doing well on the competition helps, we primarily evaluate with respect to the (interesting) ideas your team employs to solve the task. While the data is easier to obtain for such a project, there is less flexibility and more emphasis on coming up with interesting methods.

Project Structure

DescriptionPercentage
Proposal2%
Interim Report3%
Final Poster Presentation10%
Final Project Write-up15%
Peer Grading (of two other projects) (counts towards Participation)
Total30%

Please propose a topic to us in your project proposal, and we will give you feedback on the feasibility. After the project proposal, you will be assigned a contact TA that you can use as a resource for questions and advice. We recommend meeting with your contact on a regular basis, so that you identify potential problems before it is too late.

Your team will submit an interim report three weeks into the project, indicating progress made. You will present the results of your project in a poster session in conjunction with 13th STePS, as well as via a final project report.

Peer Reviewing: We will be performing peer reviewing for the project phase of the course. There are several benefits to peer-reviewing. Most importantly, it helps you understand and appreciate work from other students and groups, and it provide more feedback to everybody about the projects. Peer-reviewing means that each one of you will be given a few submissions of your classmates to read and grade. This essentially involves providing some brief comments to help each other out. Please be as fair and impartial as possible during this reviewing. TAs will also evaluate the peer review and provide feedback as well. You will be graded on how well you review other projects and how insightful your comments are. This will be an integral part of the participation grade in the class.

We will be following a double-blind peer review model. This means that the reviewer does not know whose project he or she is reviewing, nor do the authors of the project know who is reviewing them. Even more, a reviewer is not allowed to disclose who he or she is reviewing. To be clear, the course staff will know the identity of everybody.

While detailed grading rubrics for the projects will be released in due course, projects will be graded roughly looking at the following criteria:

  1. Originality
  2. Relevance to course
  3. Quality of arguments (are claims supported, how convincing are the arguments you bring forward)
  4. Clarity (how clearly are goals and achievements presented)
  5. Scope/Size (in proportion to size of group)
  6. Significance (are the questions you are asking interesting)