Last Updated:
September 16, 2017 - Some small clarifications based on student feedback.

September 6, 2017 - fix description of the filenames' extension.

August 28, 2017 - Creation of page.

September 6, 2017 - fix description of the filenames' extension.

August 28, 2017 - Creation of page.

In this homework assignment, you will be implementing logistic
regression using batch gradient descent
for binary classification. After implementing the programme, you
will need to carry out a small set of experiments based on your
working code. Complete the code in the template
`hw1-1.ipynb`

file provided. Make sure you document your
code liberally using inline python comments (and Markdown cells where
appropriate).

Implement a fixed learning rate algorithm for logistic regression
(*LFD*, page 97). Use the "batch" formulation of gradient
descent (except where indicated), instead of the more efficient
stochastic gradient descent (this is to assist with automated
grading). Initialize the weight with a zero vector **0**, a
(column) vector of all 0's. Name your defined function `lr`

for doing one iteration of logistic regression. It should work
similar to the `pla`

function in Tutorial 0, where the
original weights are input and updated weights output by running one
iteration.

Use the cross-entropy error measure for minimization (this is
"equivalent" to maximum likelihood estimation); pointwise, this is
*e _{n}*(

Note that your code will also need to calculate *E _{in}
* (for gradient calculation) and

`lr`

function also needs to take a
learning rate variable, η (eta) as a parameter, which controls the
step size. After implementation, run the following experiments in different
cells. In particular, pay attention to part (c) which may require you
to rework part of your `lr`

function to take a
parameter that modifies its instance selection.

- Run the algorithm with η = 0.05 for T = 2333 rounds. Output
the eventual weight vector as a list
and the final**w**^{T}*E*._{out} - Run the algorithm with η = 0.005 for T = 2333
rounds. Output the eventual weight vector as a list
and the final**w**^{T}*E*._{out} - Updated Instead
of using batch selection, use a deterministic version of
gradient descent where we select and update weights on
iterations based on a single point (like SGD), based on cycling
through the data with restart, where
*i*= 1, 2, 3, 4, 5, .. N, 1, 2, ... Output the final weight vector as a listand the final**w**^{T}*E*. For this part, re-run both the first two parts, a) and b), under this mode for gradient estimation._{out}

Updated For a small
amount of optional bonus marks (1 or 2; the assignment is worth 100
marks), you can also complete work to help show the visualization of
your work on the training data. Note: This is more work that may not
be commensurate to the amount of effort you need to do to accomplish
the task. The below optional exercises only need to be done for part
(a). For these questions, you should
truncate the dataset to only having two dimensions input (i.e., ignore
and discard x_{3} through x_{20}).

- Use the first two input dimensions and make a 2D plot of the data
and first two (w
_{1}, w_{2}) and bias (w_{0}) that form components of the hyperplane in those dimensions at every 333th iteration. Start with a light line segment (20% gray; where 100% is black and 0% is white) for the first iteration and uniformly change the color to black as you reach the final Tth iteration. Do not worry about early convergence. This part is similar to the code in Tutorial 0. As the plot only shows two dimensions of inputs of the twenty total dimensions, it will not show any linear separation boundary. - Use the first two weight dimensions (w
_{1}and w_{2}) and make a 3D plot of*E*._{in} - Trace the progress of
**w**towards its termination or convergence by plotting. Start with a light line segment (20% gray) for the first iteration and uniformly change the color to black as you reach the final*T*^{th}iteration. Do not worry about early convergence. This plot should be superimposed on part (e), to make a plot similar to the ones shown in lecture.

You'll find these files in the sample submission .zip.

`hw1-train.dat:`

the input file for you to train your logistic regression classifier model, which contains 1000 samples of 20-dimensional data, with the final class`1`

or`-1`

as an output, all separated by spaces.`hw1-test.dat:`

a file of test data, containing an additional 3000 samples of the same data; again with the final class`1`

or`-1`

as an output, all separated by spaces.

You are also asked to answer the following essay questions. These
are to test your understanding of the lecture materials. Note that
these are open-ended questions and do not have gold standard answers.
Note that you may need to write math equations or notation in the
files, so please familiarize yourself with this. You can either use a
word processor or do a dump (i.e., print to .pdf) of another program
that can display math (i.e., iPython notebook, R). Submit it as part
of your .zip submission, exactly named as `essay1.pdf`

.

- [
*LFD Exercise 1.2*] Suppose that we use a perceptron to detect spam messages. Let's say that each email messages represented by the frequency of occurrence of keywords, and the output is +1 if the message is considered spam.- Can you think of some keywords that will end up with a large positive weight into perceptron?
- How about keywords that will get a negative weight?
- What parameter in the perceptron directly affects how many borderline messages end up classified as spam?

- Consider a coin tossing experiment.
- Updated
Given only that the true probability, Θ, satisfies 0
≤
*p*≤ 1, what is a best estimate of Θ? (There is no additional information that you need to answer this question, but you may opt to make an assumption or experiment if it helps)

*heads*of this coin as Θ. For any value of Θ, the probability of*k heads*in*n*tosses is given by the Binomial distribution: Pr(*k*|Θ,*n*) = (*n*choose*k*)(Θ)^{k}(1 - Θ)^{n-k}. Say that 100 tosses of the same coin results in*heads*70 times and*tails*30 times.- Build a model using maximum lilkelihood estimation (MLE) to infer Θ. Which value of Θ is most likely?
- Can we judge that this is an unfair coin? Explain your answer.

- Updated
Given only that the true probability, Θ, satisfies 0
≤
- In the programming logistic regression, part (c), compare our round-robin version of gradient descent which deterministically uses the next point in turn to perform the gradient descent, versus the standard, stochastic form which chooses a single point at random for an iteration. Describe whether you think this is a good robust idea or not for datasets in general.

It can be useful to repeat the question in your essay submission, for readability purposes, but it's not mandatory.

For us to grade this assignment in a timely manner, we need you to adhere strictly to the following submission guidelines. We have over 100 students in the class, and assuming it takes 10 minutes to grade each submission -- that would be over 1.3 days of uninterrupted grading with no sleep to clear. Following the submission guideliness will help us grade the assignment in an appropriate manner. You will be penalized if you do not follow these instructions. Your matric number in all of the following statements should not have any spaces and all letters should be in CAPITALS (inclusive of the ending check letter). You are to turn in the following files:

Your source code file

`hw1-1.ipynb`

for this homework assignment (the "-1" suffix for the first programming problem, we only give one problem here). We will be reading your code, so please do us a favor and format it nicely. To expedite grading, we need to require all of the assignments to use Python 3.xIn your source code

`.ipynb`

, describe any information you want us to know about your submission, inclusive of your**suitably filled out independent work statement**. You should not include any identifiable information about your assignment (your name, phone number, etc.) except your matric number and email (we need the email to contact you about your grade, please use your*[a|e]*******@nus.edu.sg*address, not your email alias). This is to help you get an objective grade in your assignment, as we won't associate matric numbers with student names. You can include this information at the top of your source code in Markdown cell(s) in the (just the first`-1.ipynb`

, when you have multiple problems)`.ipynb`

notebook source file.Updated A .PDF file

`essay1.pdf`

that contains your answers to the essay questions (no word processing files, please export to PDF with any unusual fonts embedded within the document if needed.) Again, do not disclose any identifiable information in your essay answers. If you do the essay questions as part of your`.ipynb`

, you can safely put your answers in just that file and skip producing this separate file.Any ancillary files that your submission uses. We may or may not read them, so please put the core part of your programming answers in the

`hw1-<`

file.*x*>.ipynb

At the end of your (first) notebook source file, you need to put
your **Statement of Independent Work**. Use a Markdown cell and
follow the instructions given in tutorial about how to fill this
section out. Please be honest and if you do not follow the course
policy, suggest a plausible alternative evaluation for us to
consider.

Also put pointers or text to any (online or offline) references
you have utilised in making your submission in an appropriate
**References** cell.

You do not need to submit the sample files that are provided for you in the sample .zip file, these are just for your reference.

These files will need to be suitably zipped in a single file called
`<matric number>.zip`

. Please use a zip archive and
not tar.gz, bzip, rar or cab files. Make sure when the archive unzips
that all of the necessary files are found in a directory called
`<matric number>`

(note to Mac users, please test your
zip archives on the command line; as OSX "automagically" creates the
parent folder but it is often actually not contained in the
archive). Upload the resulting zip file to the IVLE workbin by the due
date: 19
Sep 2017, 11:59:59 pm SGT. There absolutely will be no
extensions to the deadline of this assignment. Read the late policy if
you're not sure about grade penalties for
lateness.

New: It's suggested
that you do this homework assignment in stages, and not attempting to
tackle the whole thing in one step. In particular, those new to
Python should try to first import the data into your program. You can
write the code to import the data yourself, or use a software library,
such as `pandas` to help you import. You may find some online
utilities to do logistic regression (or desktop programs such as
Microsoft Excel), which you can run with a smaller subset of the data
(both in terms of the input dimensions as well as number of
instances), to do some sanity checking against your own
`lr`

routine.

The grading criteria for the assignment is tentatively:

- 50% Programming exercises.
- 35% Essay questions.
- 15% Documentation quality, inclusive of correctly following these submission guidelines.