Some code and data from projects that I've lead:
- Web page layout parser/labeler: PARCELS. Designed,
executed and programmed by myself, Aik Miang Lau, Chee How Lee and
Sandra Lai.
- The NUS SMS
corpus. A collection of over 10K SMS messages compiled by my
former student, Yijue How, for linguistic study.
- MeURLin: a URL-only
web-page classifier that approaches the performance of full web page
classification. Can classify web pages at thousands per minute.
Used locally in integration with a focused crawler.
- ParsCit: a CRF-based
citation parser. It is being used as a basis for other digital
library projects, including the well-known computer science digital
library, CiteSeer. First
implemented by Yong Kiat Ng.
- JavaRAP:
a freely-available JAVA anaphora resolution implementation of the
classic Lappin and Leass (1994) paper, implemented by Long Qiu.
- RepLyal: lyric
and audio alignment system, a lighter-weight version of our LyricAlly
project. Created by Luong Minh Thang.
- Rapi: Our group's UI answer to
the problem of improving library catalog access. Created by Jesse Gozali Prabawa.
Most of what I've done professionally for natural language
processing is available from the Columbia NLP group's
home page. This includes the verber, centrifuser and segmenter
packages.
Besides that:
- Automated grading script. A
perl script to automatically tabulate homework grades and
mail students their results.
- A small script that
reads some macros in LaTeX and converts them to an indented textual
format. Useful for translating LaTeX or Tex to MS Powerpoint (.ppt).
My best solution to the tex2ppt problem. Use this file to translate
the LaTeX to plain, indented TeX and then import the file as an
outline in PowerPoint (Insert -> Slides from Outline). I used this as
a starting point for editing slides for the Artificial Intelligence: A Modern
Approach textbook by Stuart Russell and Peter Norvig.
- LaTeX / TeX word
counting
perl script. Counts word in the {document}
tag or the entire document (with a flag). Gives section, subsection,
subsubsection word counts. Can automatically update a special section
in your document with the word counts found. Updated in 2007, 2008 by Sam Tygier and Gregor Heinrich.
<sam@tygier.com.uk> to process other inputs included by the
\input and \include tag, and to
process books and reports with \chapter level
headers, respectively.
- Template
perl
script. Useful for building new perl scripts from.
- Another template script
for
ruby.
- Need help with LaTeX? Hypertext help is a good
site that I use all the time. I've mirrored it here.