Additional Supporting Evidence

My group does double-duty not just by performing research and publication, but as aforementioned, releasing practical implementations of systems that solve problems used by many researchers.  I include emails below that testify the importance of our research, both for academics (in Computer Science and other disciplines) as well as for industry. 


This is a recent email that I received on our keyphrase dataset that was just released the year before (2007).  I get emails like these periodically; this one was still in my inbox as I had not finished with the deliverables that the writer asked for.


from  Torsten Zesch <>


date  Thu, Aug 7, 2008 at 7:43 PM

subject     Keyphrase extraction dataset


Dear Min-Yen Kan,


I have read your excellent paper on "Keyphrase Extraction in Scientific Publications", and I really like your approach. Your keyphrase extraction dataset could be of great benefit to my experiments.


I would like to ask whether the dataset is also available as a single compressed file, since downloading every single file via the web interface would take a while.


Thanks in advance,

Torsten Zesch




Torsten Zesch

Doctoral Researcher

Ubiquitous Knowledge Processing Lab

FB 20 Computer Science Department

Technische Universität Darmstadt

Hochschulstr. 10, D-64289 Darmstadt, Germany

phone [+49] (0)6151 16-7433, fax -5455, room S2/02/E226


Here’s another example, where the discussion focuses on another tool, ParsCit, which was just released earlier this year as joint work with IST PSU folks.


from  Matteo Romanello <>


date  Wed, Jul 9, 2008 at 9:17 PM

subject     [ParsCit]


Dear Min-Yen Kan,

I discovered today on the Web your appreciable work ParsCit.

It is exactly what I was looking for. Indeed what I want to avoid was to write new templates for a tool such as ParaCite.

Another reason is that I started believing in NLP...

In my plans there is to use your software to parse bibliographies in the field of Humanities (in particular Greek and Latin literature and philology) in order to do some authomatic semantic tagging upon the parsed bibliographical records.

Obviously I will have to train ParsCit to do this.

And here comes to questions that are the main reason of my email.

A set of 2-300 marked references could be  enough to obtain good results?

The Bigger is the training data set, the better the results?


Thank you very much for your attention!

If you are interested, as soon as I have some results I could get you the data about the obtained performances (such as log files..).


Matteo Romanello

Digital Philologist

University Ca' Foscari of Venice


We even have other folks totally unrelated to our project defending our research output as a production level solution to a real-world problem.


from  Nate Vack <>

reply-to    Code for Libraries <>


date  Sat, Jul 12, 2008 at 5:18 AM

subject     Re: [CODE4LIB] anyone know about Inera?


On Fri, Jul 11, 2008 at 3:57 PM, Steve Oberg <> wrote:


> I fully realize how much of a risk that is in terms of reliability and

> maintenance.  But right now I just want a way to do this in bulk with a high

> level of accuracy.


How bad is it, really, if you get some (5%?) bad requests into your

document delivery system? Customers submit poor quality requests by

hand with some frequency, last I checked...


Especially if you can hack your system to deliver the original

citation all the way into your doc delivery system, you may be able to

make the case that 'this is a good service to offer; let's just deal

with the bad parses manually.'


Trying to solve this via pure technology is gonna get into a world of

diminishing returns. A surprising number of citations in references

sections are wrong. Some correct citations are really hard to parse,

even by humans who look at a lot of citations.


ParsCit has, in my limited testing, worked as well as anything I've

seen (commercial or OSS), and much better than most.


My $0.02,



Below I show emails that highlight the commercial interests that other corporations have had with our research developed in-house.  The first email concerns a baseline implementation of an anaphora resolution algorithm we have developed (publication #48), the second concerns an image classifier we developed (publication #38) and the third concerns a URL classifier (publication #47).


Date: Wed, 26 Jan 2005 11:15:12 +0000

From: James Hammerton <>

To: Qiu Long <>

Cc: Iain Mckay <>

Subject: Using JavaRAP commercially.




I work for a company called Graham Technology. I'm interested in

evaluating your JavaRAP anaphora resolution software for use in a

product we're developing. What are the terms for evaluating JavaRAP? And

what would be the terms for using it if we decide to do so? Could we get

access to the source code in the latter case, to help us adapt the code

for our purposes?


Yours Sincerely,


James Hammerton



from  Yves Dassas <>


date  Fri, May 19, 2006 at 11:19 PM

subject     Image categorization - NPIC


Dear Dr Min-Ken Kan,


I read a project report entitled "Synthetic Image Categorization" from one of your students (Wang fei).


I am currently working on a project that would require, among other components, an image classification tool similar to the one designed by your student.


Could you tell me whether the NPIC can be tested and/or whether they are any license associated to it?




Dr Yves Dassas

Tel.: 44 (20) 7454 12 44

DDI : 44 (20) 7354 63 36

fax.: 44 (20) 7454 12 40







date  Mon, Dec 13, 2004 at 1:27 PM

subject     Re: MEurlin


Hello Min,


We operate and what we would like to do is recognize a domain name and automatically recognize a keyword that matches the domain name and then show search results for that domain. Let me know when your demo is back online. Also what is the cost for the source code.