[ IVLE ]
[ Overview ]
[ Syllabus ]
[ Teaching Staff ]
[ Grading ]
[ Tutorials ]
[ Homework ]
HW 1
HW 2V
>HW 2N
HW 2R
[ Misc. ]
(Adapted from Peter Heeman's homework for his Spoken Language Systems course).
You should submit your working version of the parsers and your
answers to the discussion questions by the midnight on the due date.
The IVLE workbin will accept late submissions. Late submissions are
subject the the late submission policy
(read carefully). Your submission should contain a plain .txt file
called students.txt which
should have the same format as given in the hyperlink (in fact, why
don't you save the linked file now and adapt it before you submit
it?). Store your answers to the written questions in a file called
answers.txt (N.B. Make sure your files are in plain
ASCII text format -- MS Word files or other word processing files will
not be entertained). Your team's work should be done independently of
other groups; see our classes' section on Academic Honesty Policy if you have
questions about what that entails.
First you are going to build a bottom up parser. To start you off, I'm giving you the pseudocode for the parsing algorithm and the grammar that you will need to parse. Here is a sample grammar that you will need to take as input, along with a test sentence. You'll need to read this file in as input to your parse command. It can vary in the number of rules and the elements for each rule, but basically you'll have a NONTERMINAL as the first element of the rule, followed by an arrow ('->') and then one or more NONTERMINAL and/or terminal symbols as the remaining elements. NONTERMINALS are in ALL CAPS whereas terminals are always in lowercase. We can read a rule like
VP -> V NP
as VP can be replaced by the non-terminal sequence "V NP". V and NP would have to be expanded in later iterations, (hopefully) progressing towards the replacement of all NONTERMINAL symbols by terminal ones. Note that terminals cannot appear as the first element in a rule (else they would not be terminal symbols!). The special NONTERMINAL symbol 'S' denotes the start symbol (in our case it represents a sentence).
Here's the psuedocode for the bottom-up parser:
Add word sequence to list of alternatives
While list of alternatives is not empty
Pull out first alternative from list
For each rule
For each token in sequence
If rule can be applied ending at that token
Rewrite sequence with rule
If new sequence is start symbol
Add to a list of solutions
Otherwise
Store new sequence at end of list of alternatives
This is a bottom-up parser as it starts from the terminals and
tries to work its way back towards the start symbol. A top-down
parser works in reverse, by starting from the start symbol and trying
to find a sequence of rules that will eventually rewrite to form the
input text. Your group's code should be in a single class file called
BottomUpParser.java. The output of the
system when invoked should be a parse tree in parenthetical form
(e.g., (S (NP (PRONOUN you) (VP (VP (VP (V give) (NP (PRONOUN
me))) (NP (ARTICLE the) (N gold))))). There will be no empty
rules or recursive rules where the left hand side NONTERMINAL exactly
matches the right hand side non-terminals (e.g., no rules like A -> ""
or A -> A.
Questions for your group to answer and put in the
answers.txt file.
Now you are going to build a chart parser, which is much more efficient than the bottom up parser you implemented earlier. Chart parsing works by using dynamic programming; using and storing the solutions of smaller parts of sentence to solve larger problems. In contrast to the above parser which only invokes a rule if all the constituents in the right hand side (RHS) are present, chart parsing works incrementally, word by word. Consider a rule "A -> B C D". If the parser sees "B", it starts to apply this rule. Rather than rewriting B with A, we rewrite it with "A -> B . C D", where the period indicates how far in the rule we've gotten. If later, we see C after B, we can continue and rewrite the rule as "A -> B C . D". This is a called a "shift". Once D is seen, we are finished with the rule and we can rewrite it as in the earlier parser, and replace it with just "A". This is a called a "reduce" move.
A chart parser keeps track of what it has parsed by using edges. Each edge is a data structure with the following fields:
Each word in the input sentence is treated as an unit edge. For example in the sample sentence "time flies like an arrow", the first word "time" would be represented as an edge with values:
| head | time |
| done | time |
| rest | |
| start | 0 |
| end | 1 |
The parsing process takes these initial edges and gradually grows them into larger and larger edges, progressing towards the start state, as in the bottom-up parser. A new edge can be created from the joining of two edges:
|
+ |
|
=> |
|
A new edge also needs to be created from the satisfaction of the initial shift in a grammar rule:
|
+ |
|
=> |
|
You can trace the progression of the chart parser on this diamond graph. For each diamond, fill in the rules that can be applied using notation head -> done.rest. For diamond i-j the rules will be those with start at i and end at j.

A chart parser has a certain way of working its way through the edges that need to be added to the chart. All edges that are created are put in the chart, but a subset of these form an agenda. The agenda keeps track of edges that need to be processed to build new edges. Edges that have an empty (null-valued) rest field are such candidates for future work. These edges are put on the agenda. We process the agenda items, one by one, to create new edges by the Edge+Edge rule or by the Grammar+Edge rule.
Your group is the implement a chart parser using the same initial grammar. The psuedocode for the chart parser is below:
for each word
add an edge to the agenda for the word
while agenda not empty
take top edge off agenda, call it e
// Grammar+Edge Rule
for each rule whose first constituent starts with e's head
add a new edge
// Edge+Edge Rule
for each edge c in the chart
if c's end time matches up with e's start time AND
the first constituent of c's rest line is e's head
add a new edge
Implement the chart parser in a java class called
ChartParser.java. It should accept the same command line
syntax as the previous Bottom-Up parser, reading in a grammar file and
a test sentence file, and outputting a list of valid parses.
Also, you'll need to take ten sentences (we suggest five of them to
be the ones in the midterm lecture, slide 11) and
create plausible grammar rules to be able to parse them. Each word in
any of the sentences should have all of its possible parts-of-speech
listed in the production rules (e.g., if you put in the word
"bank" it should be listed both as a V and as a N). You can
use a dictionary to look up a word's acceptable parts of speech. Using
the time command on sunfire, time the command when given
to your Chart Parser and your Bottom Up parser. Discuss the
efficiency of both algorithms. You can place the answers to these
questions in the answers.txt file.
Min-Yen Kan <kanmy@comp.nus.edu.sg> Created on: Mon Dec 1 19:36:22 2003 | Version: 1.0 | Last modified: Thu Mar 18 09:32:07 2004