My group does double-duty not just by performing research and
publication, but as aforementioned, releasing practical implementations of
systems that solve problems used by many researchers. I include emails below that testify the importance of our
research, both for academics (in Computer Science and other disciplines) as
well as for industry.
is a recent email that I received on our keyphrase dataset that was just
released the year before (2007). I
get emails like these periodically; this one was still in my inbox as I had not
finished with the deliverables that the writer asked for.
Aug 7, 2008 at 7:43 PM
Dear Min-Yen Kan,
I have read your excellent paper on
"Keyphrase Extraction in Scientific Publications", and I really like
your approach. Your keyphrase extraction dataset could be of great benefit to
I would like to ask whether the dataset is
also available as a single compressed file, since downloading every single file
via the web interface would take a while.
Thanks in advance,
Ubiquitous Knowledge Processing Lab
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-7433, fax -5455, room
Here’s another example,
where the discussion focuses on another tool, ParsCit, which was just released
earlier this year as joint work with IST PSU folks.
Jul 9, 2008 at 9:17 PM
Dear Min-Yen Kan,
I discovered today on the Web your
appreciable work ParsCit.
It is exactly what I was looking for. Indeed
what I want to avoid was to write new templates for a tool such as ParaCite.
Another reason is that I started believing in
In my plans there is to use your software to
parse bibliographies in the field of Humanities (in particular Greek and Latin
literature and philology) in order to do some authomatic semantic tagging upon
the parsed bibliographical records.
Obviously I will have to train ParsCit to do
And here comes to questions that are the main
reason of my email.
A set of 2-300 marked references could be enough to obtain good results?
The Bigger is the training data set, the
better the results?
Thank you very much for your attention!
If you are interested, as soon as I have some
results I could get you the data about the obtained performances (such as log
University Ca' Foscari of Venice
We even have other folks
totally unrelated to our project defending our research output as a production
level solution to a real-world problem.
for Libraries <CODE4LIB@listserv.nd.edu>
Jul 12, 2008 at 5:18 AM
[CODE4LIB] anyone know about Inera?
On Fri, Jul 11, 2008 at 3:57 PM, Steve Oberg
> I fully realize how much of a risk that
is in terms of reliability and
> maintenance. But right now I just want a way to do this in bulk with a
> level of accuracy.
How bad is it, really, if you get some (5%?)
bad requests into your
document delivery system? Customers submit
poor quality requests by
hand with some frequency, last I checked...
Especially if you can hack your system to
deliver the original
citation all the way into your doc delivery
system, you may be able to
make the case that 'this is a good service to
offer; let's just deal
with the bad parses manually.'
Trying to solve this via pure technology is
gonna get into a world of
diminishing returns. A surprising number of
citations in references
sections are wrong. Some correct citations
are really hard to parse,
even by humans who look at a lot of
ParsCit has, in my limited testing, worked
as well as anything I've
seen (commercial or OSS), and much better
I show emails that highlight the commercial interests that other corporations
have had with our research developed in-house. The first email concerns a baseline implementation of an
anaphora resolution algorithm we have developed (publication #48), the second
concerns an image classifier we developed (publication #38) and the third
concerns a URL classifier (publication #47).
Date: Wed, 26 Jan 2005 11:15:12 +0000
From: James Hammerton
To: Qiu Long <firstname.lastname@example.org>
Cc: Iain Mckay <email@example.com>
Subject: Using JavaRAP commercially.
I work for a company called Graham
Technology. I'm interested in
evaluating your JavaRAP anaphora resolution
software for use in a
product we're developing. What are the terms
for evaluating JavaRAP? And
what would be the terms for using it if we
decide to do so? Could we get
access to the source code in the latter case,
to help us adapt the code
for our purposes?
May 19, 2006 at 11:19 PM
categorization - NPIC
Dear Dr Min-Ken Kan,
I read a project report entitled
"Synthetic Image Categorization" from one of your students (Wang
I am currently working on a project that
would require, among other components, an image classification tool similar to
the one designed by your student.
Could you tell me whether the NPIC can be
tested and/or whether they are any license associated to it?
Dr Yves Dassas
Tel.: 44 (20) 7454 12 44
DDI : 44 (20) 7354 63 36
fax.: 44 (20) 7454 12 40
Dec 13, 2004 at 1:27 PM
We operate abcsearch.com and what we would
like to do is recognize a domain name and automatically recognize a keyword
that matches the domain name and then show search results for that domain. Let
me know when your demo is back online. Also what is the cost for the source