JavaRAP
Introduction
JavaRAP is an implementation of the classic Resolution of Anaphora
Procedure (RAP) given by Lappin and Leass
(1994) .
It resolves third person pronouns, lexical anaphors, and identifies
pleonastic pronouns. The original purpose of the implementation is to
provide anaphora resolution result to our TREC 2003 Q&A system. Since RAP
is so widely known, we shortly came to the idea of making this
implementation freely available to the research community, in the hope
that it could
- be used as a reference to
benchmark other anaphora resolution algorithms or systems; and
- provide
anaphora resolution function as needed by other NLP
applications.
We name the implementation as JavaRAP because it is developed in Java.
Such a decision in programming language makes it more easily portable
over hardware platforms, however less straightforwardly when the
operating
system changes.
Features
- Target Language:
- Input Format:
- Output Format:
- anaphor - antecedent pairs;
- text with in-place substitutions.
- Accuracy:
- Efficiency
- around 1,500 words
per second (on a P2.4G, 1G linux box, excluding parsing time)
- Programming Language
- Software Required
- Associated
Tools:
- Sentence Splitter
- Anaphora Resolver Evaluator
- performing pair-wise comparison between resolution results
(by JavaRAP or other
resolver) and annotations (MUC co-reference annotation convention).
News
Dec 15, 2006:
JavaRAP_1.11: A bug in edu.nus.comp.nlp.tool.anaphoraresolution.Evaluation is fixed.
But still, it doesn't
work well with JDK1.5 (and I suspect it has something to do with
the changes in regular expression part). Please use JDK1.4x instead if
you notice some problems. And ... any suggestions about this issue
about JDK1.5? Thanks!
Feb 12, 2006:
JavaRAP now works on Windows® after the way it calls the Charniak
parser is changed (previously depending on pipe);
The MANIFEST file is corrected and JavaRAP can be started by an easier
"java -jar AnaphoraResolution.jar";
Source code released.
July 25, 2005:
A bug in SentenceSplitter fixed.
Source of SentenceSplitter released.
May 25, 2005:
Thinking about releasing the source code under GPL. An alpha
version is ready for testing now. Please drop me an email if you want
to check it out. Thanks ...
Start FAQ.
Sep 27,2004 (Chinese Mid Autumn Festival):
Package name changed from edu.nus.comp.NLP.tool.anaphoraresolution
to edu.nus.comp.nlp.tool.anaphoraresolution.
Demo
Try
it out online!
Download
JavaRAP_1.11(classes)
(codes)
JavaRAP_1.1(classes)
(codes)
JavaRAP_1.02(classes
only)
JavaRAP_1.01(classes
only)
Sentence
Splitter
Installation
- Decompress the tar file.
- Make sure you have the following files and directories under
"JavaRAP_x.y":
- AnaphorResolution.jar
- env.jrap
(the file
JavaRAP reads environment variables from). Please modify it for JavaRAP to
work properly.
- Data (the directory
where the JavaRAP's knowledge base resides)
- Change the path to "JavaRAP_x.y", where "x.y" are version
numbers, and you are ready to go!
Usage
- To resolve anaphors
in a text file inputFile by
JavaRAP:
- java -classpath AnaphoraResolution.jar
edu.nus.comp.nlp.tool.anaphoraresolution.JavaRAP
inputFile
- alternatively: java
-jar AnaphoraResolution.jar inputFile
- To use JavaRAP to resolve
anaphors and compare the results with the annotations in an annotatedFile:
- java -classpath AnaphoraResolution.jar
edu.nus.comp.nlp.tool.anaphoraresolution.Evaluation
annotatedFile
- To compare the anaphora
resolution results stored in a resultFile* with
the annotations in a annotatedFile:
- java -classpath AnaphoraResolution.jar
edu.nus.comp.nlp.tool.anaphoraresolution.Evaluation -r resultFile annotatedFile
- To split the text in
inputFile into sentences:
- java -jar SentenceSplitter.jar inputFile
*Each line in a resultFile
contains a single record,
which shows one anaphor and its antecedent, in the form:
(sentenceIndex,tokenOffset) antecedent <--
(sentenceIndex,tokenOffset) anaphor,
where sentenceIndex
is the index of the sentence in the complete article, starting from 0;
and
tokenOffset
is the index of the first token
(puncuation or word) of the phrase in the sentence, starting from 0 as
well.
FAQ & Special Issues
- I was wondering if the annotations have to be in a special format
(for automatic evaluation)?
- JavaRAP expects MUC style co-reference annotations for
automatic evaluation.
- I got the latest
Charniak parser compiled on Windows XP, but within cygwin. The executable (parseIt) has
an extra suffix .exe and that is not what JavaRAP expects. Please
make a copy of the executable and name it "parseIt" if you run into
this problem.
- Why did I get "Exception in
thread "main" java.lang.NoClassDefFoundError:
edu/nus/comp/NLP/tool/anaphoraresolution/JavaRAP"?
- I changed the package name from edu/nus/comp/NLP/tool/anaphoraresolution to edu/nus/comp/nlp/tool/anaphoraresolution.
Sorry about that.
References
Long Qiu, Min-Yen Kan and Tat-Seng Chua. (2004). A Public Reference Implementation of the
RAP Anaphora Resolution Algorithm. In
proceedings of the Fourth International Conference on Language
Resources and Evaluation (LREC 2004). Vol. I, pp. 291-294.
Links
The poster
we presented at LREC 2004, Lisbon, Portugal.
Check out our Natural
Language
Processing / Information Retrieval research framework webpage
to find other NLP resources we have.
Contact
QIUL at COMP dot NUS dot EDU dot SG
This document, JavaRAP.html, has been accessed 8354 times since 09-Jun-04 14:49:04 SGT.
This is the 2nd time it has been accessed today.
A total of 2817 different hosts have accessed this document in the
last 1440 days; your host, 38.103.63.17, has accessed it 1 times.
If you're interested, complete statistics for
this document are also available, including breakdowns by top-level
domain, host name, and date.