[ Back to the RPNLPIR home page ]
This is a re-worked version of Michael Collins' parser, altered from the original distribution. The changes were made to enable the creation of a daemon running mode. In this version, the hash table of events is loaded once, and then the executable forks to place a copy of itself in the background. This background daemon process will wake every 15 seconds to check for an existence of an input file. If it exists it will begin the parsing. The daemon reports its progress or status in a separate file, complete with timestamp.
I have tested against the original output (Section 23 of the Penn Treebank) and it is stable against this test.
More comprehensive versions of client/server architecture for the Collins' parser was done by Eli Barzilay of Cornell University.
As the Collins' parser is licensed under GNU GPL, so are these modifications. I am distributing them both as a simple .tgz file or as annotated html code.
parser.out -d countsfile grammarfile status_file polled_inputfile parser.out countsfile grammarfile inputfile
To run the daemonized version of Collins' parser, note that I have changed the command line format (for my own convenience). The following old command-line options have been set in the main.c file (now cannot be overridden by the command line. They are set at their default values as in the README file of the original distribution.
The biggest change is that the events file is now a command-line option, rather than being read on stdin. This means you have to unzip the counts files in each model before specifying it in the command line.
Here's a typical invocation:
code/parser -d models/model1/events models/model1/grammar /tmp/status /tmp/toParse.tagged
This will place the process as a daemon after the hashtable is loaded. It will check for the existence of /tmp/toParse.tagged and eventually process this to /tmp/toParse.tagged.out. The status file /tmp/status will be updated during the course of execution. If the input file doesn't exist or is finished processing the process will wait until a new input file of the same name appears in the given directory.