My research interests fall under the areas of digital
libraries, natural language processing, information retrieval,
human-computer interaction. Specifically, they include document
structure acquisition, verb analysis, digital library resource
annotation and and applied text summarization. My research goal
aims to investigate how natural language processing and
information retrieval can be applied to improve scholarly
publication and knowledge discovery.
I run the Web, Information Retrieval /
Natural Language Processing Group (WING) at SoC. We are not
the only group dealing with these topics — Web, DL, IR and
NLP — and our research isn't limited just to these topics,
but it is a good description of the research we do. We have lots
of demos, projects and corpora there, including ones that I have
had a direct hand in coding such as those on webpage
classification, document structure and
reference string parsing. There's also plenty of newer work
done by the students in WING: including Twitter retweet
predictor and classifier, and the
Chaptrs photograph organizing and sharing app (for Macs), the
largest SMS corpus (please contribute!). and the world's
best summarization system.
WING is currently affiliated with the China Singapore Institute of Digital
Media (CSIDM) and the NUS-Tsinghua Extreme
Search Centre (NExT).
I'm also a member and potential
supervisor for students in the NUS Graduate School for
Integrative Sciences and Engineering (NGS).
I also lead our group in collecting resources used to do such
research. Visit the Natural
Language Processing / Information Retrieval research framework
webpage (cte/sunfire) to see what tools we have available and
installed for related research directions and projects.
Conversely, if you're currently doing an FYP or UROP, I've
written some notes on what it's like to
grade them and what you should be doing as students to try to
optimize your grade. If you are doing a thesis proposal as a
Ph.D. student, you might want to read this
- Muthu Kumar Chandrasekaran, Topics in MOOC Forums
- Chencan Xu, Topics in MOOCs
- Animesh Prasad, Topics in Digital Libraries and Deep Learning
- Kishaloy Halder, Topics in Health Recommendation Systems
- Wenqiang Lei, Topics in Domain Adaptation
My group also hosts the occasional postgraduate intern from
collaborative projects or one-off internships, which are not
I list WING's graduated graduate students (MS, Ph.D.) here. I
have also directly supervised more than 70 undergraduate projects
and theses. More accurate about the current affiliations of our
alumni be found in our LinkedIn
group (viewable only by members). A more complete list of
past alumni (including undergraduates and system staff), see WING.
- Dr Tao Chen, Analyzing Image Tweets in Microblogs, graduated 2016, now a postdoctoral researcher at Johns Hopkins University
- Dr Xiangnan He, Exploiting User Comments for Web Applications, graduated 2016, now a postdoctoral researcher with the NExT Center
- Dr Aobo Wang, Addressing Informality in Processing Chinese Microtext, graduated 2015, now with BNP Paribas, Singapore.
- Dr Jovian Lin, Recommender Algorithms for Mobile Applications, graduated 2014, now a Postdoctoral Fellow with Living Analytics Research Centre (LARC), SMU, Singapore.
- Dr Jun Ping Ng, Interpreting Time In Text, Summarizing Text With Time, graduated 2014, now Software Engineer with Demand Forecasting, Amazon, New York, USA.
- Dr Jesse Prabawa Gozali, Intra-event Photo Organization, graduated 2013, now a Researcher with Mobilewalla, Singapore
- Dr Jin Zhao, Domain Specific Information Retrieval, graduated 2013, now an Instructor with the School of Computing, NUS, Singapore
- Bamdad Bahrani, Reëxamining Slide Alignment, graduated 2012, now a Data Analytics Specialist at Nurse Next Door
- Dr Ziheng Lin, Discourse Parsing, graduated 2012, now a Data Scientist, Strategic Marketing, Marketing Division, Singapore Press Holdings, Singapore, previously with SAP Singapore.
- Dr Yee Fan Tan, Cost-Sensitive Web-Based Information Acquisition for Record Matching, graduated 2011, now a Developer at KAI Square.
- Cong Duy Vu Hoang, Automatic Related Work Summarization, graduated 2010, now a doctoral candidate at the University of Melbourne, previously Senior Research Engineer at the HLT Department, Institute of Infocomm Research (I2R), Singapore.
- Dr Long Qiu, Scenario Template Generation, graduated April 2009, now a research scientist at Taobao, Alibaba, China; previously a Research Fellow with the Institute of Infocomm Research (I2R), Singapore
- Dr Hendra Setiawan, Gapped Constituency Phrase-Based Machine Translation, graduated 2008, now with BBN Raytheon, New York, NY, previously at IBM Watson Labs, and University of Maryland.
- Dr Hang Cui, Soft Pattern Matching, graduated July 2006, now a Staff Software Engineer / Engineering Manager at Google, previously with Yahoo! Engineering, Google and OneRiot.
I list the invited talks for past conferences, workshops and
other events I have had the privilege to lecture. Slides and
videos for some of the talks are available on YouTube
and from a separate talks page that I
- 2015 - Invited Speaker, "Instructors, Learners and Machines: Learning instructor intervention from MOOC forums". At the 2nd Greater China MOOC Symposium, Taoyuan, Taiwan, 16 August.
- 2015 - Keynote Speaker, "Keywords,
phrases, clauses and sentences: Topicality, indicativeness and
informativeness at scales". At Novel
Computational Approaches to Keyphrase Extraction Workshop,
Beijing, China, 30 July.
- 2015 - Invited Talk, "Improving Web 2.0 Recommendation Leveraging User Comments via Latent Model Regularization" -- Microsoft Research Asia, Beijing, China, 24 July.
- 2015 - Invited Talk, "Improving Web 2.0 Recommendation Leveraging User Comments via Latent Model Regularization" -- Linköping, Sweden, 27 May.
- 2015 - Invited Talk, "Serving
the Readers of Scholarly Documents: A Grand Challenge for the
Introspective Digital Library". At International
Conference on Big Data and Smart Computing (BigComp 2015),
Jeju Island, South Korea, 9 February.
- 2015 - Invited Talk, "Serving
the Readers of Scholarly Documents: A Grand Challenge for
the Introspective Digital Library". At the Mining Big
Text (MBT '15) Workshop, Yonsei University, Seoul, South
Korea, 10 February.
- 2014 - Keynote Speaker, the small
data of scholarly documents, At Web Science and Data Analytics
Summer School, Singapore, 11 December
- 2014 - Invited Keynote, Opportunities
for Multimedia Analysis in Scholarly Digital Libraries.
At the Workshop
on Speech, Language and Audio in Multimedia (SLAM '14),
Satellite Workshop of Interspeech 2014, Penang, Malaysia,
11 Sep 2014.
I have proposed, managed and collaborated on a number of research grants in Singapore. Here's a non-exhaustive listing of some of my research endeavors. Funding in terms of Singapore dollars, unless otherwise noted.
- Co-PI, "NExT++: Towards Web Intelligence and User Empowerment" - 500K (2016-2019), NRF, Singapore
- Collaborator, "面向课程的大规模在线教育资源组织与持续优化的 理论与方法" - 450K RMB, NSF, China
- PI, "Investigating Instructor Intervention in MOOC Forums" - 167K (2015-2018), NUS LIFT grant
- Co-PI, "NExT Search Center" - 6.1M (2010-2015), NRF MDA grant
- PI, "Data Mining for Supporting Critical Reviews in Evidence Based Nursing" - 98K (2010-2012)
- Co-PI, joint with Philip S Cho (NUS, ARI), Ben Sovacool (NUS, LKYSPP), "Mapping the Technological and cultural landscape of scientific development in Asia" - 225K (2010-2013), from Global Asia Institute
- PI, "Co-training NLP systems and Language Learners" - 234.5K (2008-2014) CSIDM phases I and II
- Co-PI, joint with Tat-Seng Chua and Chew Lim Tan (NUS) - "Interactive Media Search" - 1.9M (2007-2010), NRF MDA grant
- Co-PI, joint with Yin Leng Theng, Chunyan Miao (NTU), Ai Chee Tang (SMU) - "Empirical Usability Studies with E-Learning Systems: Towards Executable Cognitive User Models as Design and Usability Evaluation Aids" - 24K (2007), from A*STAR HFE pilot grant
- PI, "Mathematical Equation Indexing, Search and Retrieval" - 39K (2006-2007)
- Co-PI, joint with Chew Lim Tan and Danny Poo (NUS), "Document Information Mining for Digital Libraries" - 23K (2006-2008), from HP Labs
- PI, "Natural Language Query Analysis for Web Queries" - 41K (2006-2007)
- Recipient of 60K (2004), NUS Interdisciplinary Technology Equipment Grant
- PI, "Corpus-Based Query Expansion in Online Public Access Catalogs" - 31K (2003-2006)
- PI, "Towards multi document indicative summarization via automated metadata extraction", 23K (2003-2006)
- Microsoft Research - For research and development of a shared task and corpus on scientific document summarization (SGD 27K, 2016)
- NVidia - For research and development in NLP using GPU technologies - 1 GTX Titan X (USD 800, 2015)
- Elsevier Unrestricted Gift - For research and development of digital libraries and coordination of the Elsevier SGCodeJam24 and Code for Science (USD 2K, 2011)