My research interests fall under the areas of digital
libraries, natural language processing, information retrieval,
human-computer interaction. Specifically, they include document
structure acquisition, verb analysis, digital library resource
annotation and and applied text summarization. My research goal
aims to investigate how natural language processing and
information retrieval can be applied to improve scholarly
publication and knowledge discovery.
I run the Web, Information Retrieval /
Natural Language Processing Group (WING) at SoC. We are not
the only group dealing with these topics — Web, DL, IR and
NLP — and our research isn't limited just to these topics,
but it is a good description of the research we do. We have lots
of demos, projects and corpora there, including ones that I have
had a direct hand in coding such as those on webpage
classification, document structure and
reference string parsing. There's also plenty of newer work
done by the students in WING: including Twitter retweet
predictor and classifier, and the
Chaptrs photograph organizing and sharing app (for Macs), the
largest SMS corpus (please contribute!). and the world's
best summarization system.
WING is currently affiliated with the China Singapore Institute of Digital
Media (CSIDM) and the NUS-Tsinghua Extreme
Search Centre (NExT).
I'm also a member and potential
supervisor for students in the NUS Graduate School for
Integrative Sciences and Engineering (NGS).
I also lead our group in collecting resources used to do such
research. Visit the Natural
Language Processing / Information Retrieval research framework
webpage (cte/sunfire) to see what tools we have available and
installed for related research directions and projects.
Conversely, if you're currently doing an FYP or UROP, I've
written some notes on
what it's like to grade them and what you should be doing as
students to try to optimize your grade. When you're ready to
present your work in a defense, check out the
notes that I wrote on the defense process.
- Animesh Prasad, Structured Information Extraction for Scientific Documents
- Kishaloy Halder, Information Retrieval Techniques to Facilitate Discussion in Online Forums
- Wenqiang Lei, Topic Continuity for Discourse and Dialogue
- Liangming Pan, Topics in Knowledge Graphs and Question Answering. NGS Scholar.
- Abhinav Ramesh Kashyap, Topics in Scientific Document Processing
- Taha Aksu, Topics in Dialogue/Question Answering, SINGA Scholar, co-supervised with Dr Nancy Chen (I2R).
- Samson Tan, Topics in Dialogue, Salesforce IPP, co-supervised with Shafiq Joty (Salesforce Research).
My group also hosts the occasional postgraduate intern from
collaborative projects or one-off internships, which are not
I list WING's graduated graduate students (MS, Ph.D.) here. I
have also directly supervised over 100 undergraduate projects and
theses. More accurate about the current affiliations of our
alumni be found in our LinkedIn
group (viewable only by members). A more complete list of
past alumni (including undergraduates and system staff), see WING.
- Dr Muthu Kumar Chandrasekaran, A Discourse Centric Framework for Facilitating Instructor Intervention in MOOC Discussion Forums, graduated 2019, now an Advanced Computer Scientist, Machine Learning with Stanford Research Institute International (SRI).
- Yuanxin Xiang, Verb Duration Determination, graduated 2017, now at Institute of Infocomm Research (I2R), Singapore.
- Chencan Xu, CrowdMOD: Crowdsourced Moderation for Structured Online Deliberation, graduated 2017.
- Dr Tao Chen, Analyzing Image Tweets in Microblogs, graduated 2016, now a Research Engineer with Google USA (California).
- Dr Xiangnan He, Exploiting User Comments for Web Applications, graduated 2016, now a professor at the University of Science and Technology of China
- Dr Aobo Wang, Addressing Informality in Processing Chinese Microtext, graduated 2015, now a Lecturer and Consultant with the Institute of System Science, Singapore.
- Dr Jovian Lin, Recommender Algorithms for Mobile Applications, graduated 2014, now with AI Scientist with GIC Singapore.
- Dr Jun Ping Ng, Interpreting Time In Text, Summarizing Text With Time, graduated 2014, now Software Engineer with Demand Forecasting, Amazon, New York, USA.
- Dr Jesse Prabawa Gozali, Intra-event Photo Organization, graduated 2013, now a Researcher with Mobilewalla, Singapore
- Dr Jin Zhao, Domain Specific Information Retrieval, graduated 2013, now an Instructor with the School of Computing, NUS, Singapore
- Bamdad Bahrani, Reëxamining Slide Alignment, graduated 2012, now a Data Analytics Specialist at Nurse Next Door
- Dr Ziheng Lin, Discourse Parsing, graduated 2012, now a Senior Director with Dentsu Aegis, Singapore, previously with SAP Singapore.
- Dr Yee Fan Tan, Cost-Sensitive Web-Based Information Acquisition for Record Matching, graduated 2011, now a Lead Scientist with NCS, Singapore.
- Cong Duy Vu Hoang, Automatic Related Work Summarization, graduated 2010, now a doctoral candidate at the University of Melbourne, previously Senior Research Engineer at the HLT Department, Institute of Infocomm Research (I2R), Singapore.
- Dr Long Qiu, Scenario Template Generation, graduated April 2009, now a research scientist at Taobao, Alibaba, China; previously a Research Fellow with the Institute of Infocomm Research (I2R), Singapore
- Dr Hendra Setiawan, Gapped Constituency Phrase-Based Machine Translation, graduated 2008, now with BBN Raytheon, New York, NY, previously at IBM Watson Labs, and University of Maryland.
- Dr Hang Cui, Soft Pattern Matching, graduated July 2006, now a Staff Software Engineer / Engineering Manager at Google, previously with Yahoo! Engineering, Google and OneRiot.
I list the invited talks for past conferences, workshops and
other events I have had the privilege to lecture. From 2018-2020, I am a Slides and
videos for some of the talks are available on YouTube
and from a separate talks page that I
maintain somewhat. I
- 2018 - Invited Speaker, "Research Fast and Slow", COLING, 24 Aug 2018, Santa Fe, NM, USA. [ Video @ Vimeo ]
- 2017 - Invited Speaker, "Technology vs Learner Engagement: Always a Tradeoff". At the innovLogue, 16 Mar 2017, Singapore, Singapore, Institute of Adult Learning.
- 2015 - Invited Speaker, "Instructors, Learners and Machines: Learning instructor intervention from MOOC forums". At the 2nd Greater China MOOC Symposium, Taoyuan, Taiwan, 16 August.
- 2015 - Keynote Speaker, "Keywords,
phrases, clauses and sentences: Topicality, indicativeness and
informativeness at scales". At Novel
Computational Approaches to Keyphrase Extraction Workshop,
Beijing, China, 30 July.
- 2015 - Invited Talk, "Improving Web 2.0 Recommendation Leveraging User Comments via Latent Model Regularization" -- Microsoft Research Asia, Beijing, China, 24 July.
- 2015 - Invited Talk, "Improving Web 2.0 Recommendation Leveraging User Comments via Latent Model Regularization" -- Linköping, Sweden, 27 May.
- 2015 - Invited Talk, "Serving
the Readers of Scholarly Documents: A Grand Challenge for the
Introspective Digital Library". At International
Conference on Big Data and Smart Computing (BigComp 2015),
Jeju Island, South Korea, 9 February.
- 2015 - Invited Talk, "Serving
the Readers of Scholarly Documents: A Grand Challenge for
the Introspective Digital Library". At the Mining Big
Text (MBT '15) Workshop, Yonsei University, Seoul, South
Korea, 10 February.
- 2014 - Keynote Speaker, the small
data of scholarly documents, At Web Science and Data Analytics
Summer School, Singapore, 11 December
- 2014 - Invited Keynote, Opportunities
for Multimedia Analysis in Scholarly Digital Libraries.
At the Workshop
on Speech, Language and Audio in Multimedia (SLAM '14),
Satellite Workshop of Interspeech 2014, Penang, Malaysia,
11 Sep 2014.
I have proposed, managed and collaborated on a number of research grants in Singapore. Here's a non-exhaustive listing of some of my research endeavors. Funding in terms of Singapore dollars, unless otherwise noted.
- Co-PI, "Course Suggestion for Career Planning: Evaluating Strategies to Support Lifelong Learning. A Pilot on Using Analytics to Recommend SkillsFuture Credit Courses" - 161K (2018-2019), WDARF, Singapore
- Co-PI, "NExT++: Towards Web Intelligence and User Empowerment" - 500K (2016-2019), NRF, Singapore
- Collaborator, "面向课程的大规模在线教育资源组织与持续优化的 理论与方法" - 450K RMB, NSF, China
- PI, "Investigating Instructor Intervention in MOOC Forums" - 167K (2015-2018), NUS LIFT grant
- Co-PI, "NExT Search Center" - 6.1M (2010-2015), NRF MDA grant
- PI, "Data Mining for Supporting Critical Reviews in Evidence Based Nursing" - 98K (2010-2012)
- Co-PI, joint with Philip S Cho (NUS, ARI), Ben Sovacool (NUS, LKYSPP), "Mapping the Technological and cultural landscape of scientific development in Asia" - 225K (2010-2013), from Global Asia Institute
- PI, "Co-training NLP systems and Language Learners" - 234.5K (2008-2014) CSIDM phases I and II
- Co-PI, joint with Tat-Seng Chua and Chew Lim Tan (NUS) - "Interactive Media Search" - 1.9M (2007-2010), NRF MDA grant
- Co-PI, joint with Yin Leng Theng, Chunyan Miao (NTU), Ai Chee Tang (SMU) - "Empirical Usability Studies with E-Learning Systems: Towards Executable Cognitive User Models as Design and Usability Evaluation Aids" - 24K (2007), from A*STAR HFE pilot grant
- PI, "Mathematical Equation Indexing, Search and Retrieval" - 39K (2006-2007)
- Co-PI, joint with Chew Lim Tan and Danny Poo (NUS), "Document Information Mining for Digital Libraries" - 23K (2006-2008), from HP Labs
- PI, "Natural Language Query Analysis for Web Queries" - 41K (2006-2007)
- Recipient of 60K (2004), NUS Interdisciplinary Technology Equipment Grant
- PI, "Corpus-Based Query Expansion in Online Public Access Catalogs" - 31K (2003-2006)
- PI, "Towards multi document indicative summarization via automated metadata extraction", 23K (2003-2006)
- Microsoft Research - For research and development of a shared task and corpus on scientific document summarization (SGD 27K, 2016)
- NVidia - For research and development in NLP using GPU technologies - 1 GTX Titan X (USD 800, 2015)
- Elsevier Unrestricted Gift - For research and development of digital libraries and coordination of the Elsevier SGCodeJam24 and Code for Science (USD 2K, 2011)