Menu

[ IVLE ]

[ Overview ]
[ Syllabus ]
[ Grading ]
[ Homework ]
  HW 1
  >HW 2
[ Misc. ]

Updated on Tue Mar 20 01:34:27 GMT-8 2007 - Corrected what items to turn in.

Homework #2 - Single Document Summarization for Scientific Papers

In this assignment, you will be developing a generic (i.e., non-query based) single document summarization system for the same scientific documents in the cs 5246 scientific document collection. As you are dealing with the same input corpus, the same notes apply to this corpus, namely, that semi-structure can be often be recovered and that the input is noisy.

You will have to decide whether you want to construct an extractive or an abstractive summarization system. The evaluation criteria for both types of systems will be different; see below in the grading criteria.

Note that the textual input to your system will be one of the text files in the 5246 corpus. However, unlike the original documents, the input fed to your system will NOT include any abstracts, keywords and/or general terms.

Thanks to your midterm feedback, this homework may be done in pairs or individually. There will be no differentiation in grading for assignments done in pairs or individually. Note that if you do this assignment as a team, you are responsible for making sure the workload is balanced between both members. I will not be involved in balancing workload between team members. Please read the notes on grading again if you have any concerns. If you do the assignment as team, you should concatenate both your matric numbers together joined by an underscore ('_') in your submission.

What to turn in

You will upload an X.zip (where X is your matric ID, where all letters are in uppercase) archive by the due date, consisting of the following four sets of items. Note that I do not want to know who you are, with respect to grading assignments, so it is important that you try not to reveal your identity in your submission. Please follow the below instructions to the letter.

  1. A summary file in plain text (not MS Word, not OpenOffice), giving your matric number and your NUS (u|g) prefixed email address (as the only form of ID) that describes your submission and the architecture for retrieval. In this file you also need to describe how your source code can be built and executed on sf3/sunfire. (filename: ReadmeX.txt, where X is your matric ID). You should include notes about the development of your submission, whether your system is abstractive or extractive, and special features that you developed to handle the structure of the queries and documents.
  2. The code for your system: tested, compilable and runnable on sf3/sunfire, which is where I will run your code. Your code should read a file from standard input (which will be a file from the cs5246 corpus; inputted by "cat <filename> | yourProgram", and produce the summary on standard output. Note, you may open up temporary files in /tmp, and assume that only one instance of your code will be executing at any time. As the assignments will be tested and run on sf3/sunfire, your may choose to interface with other common tools or libraries on sf3/sunfire, as per Assignment #1.

Please use a ZIP (not RAR, B2Z or TAR) utility to construct your submission. Do not include a directory in the submission to extract to (e.g., unzipping X.zip should give files like X.txt, not X/X.txt or submission/X.txt). Please use all capital letters when writing your matric number (matric numbers should start with U, NT, HT or HD for all students in this class). Your cooperation with the submission format will allow me to grade the assignment in a timely manner.

Grading scheme

Your grade will take into account 1) features used, 2) summary quality, 3) documentation and 4) time efficiency. These factors are listed in order of importance/weighting to your final grade for the assignment. Warning -- I will be reading your code, so please make sure it is tidy and well documented.

Due date and late policy

According to the syllabus, this homework is due on 9 Apr at 11:59 pm SGT. The late policy for submissions applies as per the policy set forth on the "Grading" page.

References

Note / Warning: If you use any of these resources (especially software), you'll have to cite it and be explicit about what you did to change it or customize it for the task in our assignment. Simply learning how to use a software does not constitute a worthy homework assignment submission.


Min-Yen Kan <kanmy@comp.nus.edu.sg> Created on: Sun Jan 21 16:31:48 2007 | Version: 1.0 | Last modified: Tue Mar 20 01:38:04 2007