Confidence-weighted (CW) learning

Confidence-weighted (CW) learning is an online-learning algorithm for linear classification. This package contains the source of our implementation for multiclass and binary CW learning. We used this code in our submission to the HOO 2012 shared task.

Software available for download.

MaxMatch (M^2)

MaxMatch (M^2) is a method for evaluating grammatical error correction. The method computes the sequence of phrase-level edits between a source sentence and a system hypothesis that achieves the maximal overlap with the gold standard. This edit sequence is scored using F-measure.

(UPDATED) The release has been improved by the capability of evaluating edits against multiple gold standards.

Software (version 3.2) available for download.


MAXSIM calculates a similarity score between a pair of English system-reference sentences by comparing information items such as n-grams across the sentence pair. However, unlike most metrics which do binary matching, MAXSIM computes a similarity score between items. To find a maximum weight matching that matches each system item to at most one reference item, the items are then modeled as nodes in a bipartite graph.

Software available for download.


BioKIT is a semantic role labeling (SRL) system for biomedical texts. The SRL model is trained using domain adaptation techniques and data from the Propbank and Bioprop corpus. Unlike the earlier release, BioKIT can now be run "out of the box" to process plain text files.

Software available for download.


IMS (It Makes Sense) is a supervised English all-words word sense disambiguation (WSD) system. The flexible framework of IMS allows users to integrate different preprocessing tools, additional features, and different classifiers. By default, we use linear support vector machines as the classifier with multiple features. This implementation of IMS achieves state-of-the-art results on several SensEval and SemEval tasks. You can try the demo and download the software:


TESLA is a family of automatic machine translation evaluation metrics with state-of-the-art performances. This is the version released for WMT 2011.

  • JTESLA v2 available for download.
    TESLA-M is now re-implemented in Java. This is the recommended and the
    most convenient way to use TESLA-M and to integrate TESLA-M into other
    processing pipelines. This version gives essentially the same result
    as TESLA-M v2.
  • TESLA v2 available for download.

The following packages are needed for minimum error rate tuning with TESLA using ZMERT as reported in (Liu et al. 2011).

The supplementary material of (Liu et al. 2011) released for EMNLP 2011 can be found here.


This is the version released for ACL 2012 paper (Liu et al. 2012).

This package implements the Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries. For languages such as Chinese where words usually have meaningful internal structure and word boundaries are often fuzzy, TESLA-CELAB acknowledges the advantage of character-level evaluation over word-level evaluation. By reformulating the problem in the linear programming framework, TESLA-CELAB addresses several drawbacks of the character-level metrics, in particular the modeling of synonyms spanning multiple characters.

Software available for download.


PEM is the first fully automatic metric to evaluate the quality of paraphrases, and consequently, that of paraphrase generation systems. Our metric is based on three criteria: adequacy, fluency, and lexical dissimilarity. The key component in our metric is a robust and shallow semantic similarity measure based on pivot language N-grams that allows us to approximate adequacy independently of lexical similarity.

Software available for download.