| |
Areas of research in Computer Science:
Artificial Intelligence (AI) may be considered one of the
newest disciplines in the history of science. The name Artificial
Intelligence was coined in 1956, soon after World War II. Today, AI
encompasses a wide range of sub-disciplines, from general topics such
as learning and perception to specific areas such as game playing,
theorem proving, medical diagnosis, robotic control, as well as image
and text processing. In a sense, AI aims to emulate human intelligence
in various problem solving endeavours involving knowledge processing,
such as learning new knowledge, dealing with uncertainty in knowledge,
and finding knowledge presented in different formats including texts
and images. The above examples of knowledge processing represent the
five areas of AI interests in the School of Computing:
- Machine Learning
- Uncertainty Management
- Image Mining
- Document Analysis
- Digital Libraries
Machine Learning
Learning is one of the fundamental capabilities of intelligent
systems that have been studied since the dawn of AI. Alan Turing
proposed machine learning as a possible way to construct machines that
may be able to pass the Turing Test in his landmark 1950 paper
“Computing Machinery and Intelligence”. One of the
earliest success stories in AI, Samuel’s checkers player, was
constructed with the help of machine learning.
Machine learning methods currently give the best performance
in many practical problems in areas such as computer vision, speech,
and natural language processing and robot control. Along with
impressive practical successes, much has been achieved in our
understanding of the fundamental issues involved in machines that
learn. Learning theory characterises the conditions under which
learning can be successful in two main ways: probabilistically, along
the lines of Valiant’s Probably Approximately Correct (PAC)
framework, and in the limit using recursion theoretic methods
(inductive inference), as initiated by Gold in 1967.
Our work spans both the theoretical and applied fronts of
machine learning. On the theory side, we have been working on non
U-shaped learning, which is currently a much investigated topic in
inductive inference. We have also been investigating new approaches
such as the inversion of operators in a learning-theoretic context and
the modes of data presentation for function learning. Towards the
applied side, we have been looking at the use of unlabeled data in
learning, including semi-supervised and unsupervised learning methods.
In addition, we have been investigating the use of machine learning for
applications such as activity recognition and natural language
processing. In natural language processing, our entry in the evaluation
conference, SemEval 2007, has been ranked first in the word sense
disambiguation lexical sample task.
Uncertainty Management
Computational technologies for uncertainty management facilitate the
identification of relevant information, the acquisition of useful
insights, and the implementation of timely actions in complex decision
problems. Many recent global events of significant and sometimes
catastrophic impact, e.g., business globalisation, disease outbreaks,
terrorism attacks, earthquakes and tsunamis, etc., have highlighted the
need for decision support systems that can efficiently and
systematically help manage uncertainties and minimise surprises.
Current technologies cannot adequately address the major challenges
brought by substantial contextual, informational and temporal changes
in such decision situations. These are the main challenges being
addressed at the Medical Computing Laboratory in SoC, with an emphasis
in the biomedical domains. Our research group is part of the
multidisciplinary Biomedical Decision Engineering (BIDE) team in the
University, comprising faculty members, researchers and students from
NUS, and various local and overseas institutions and healthcare
organisations.
Our long-term research agenda focuses on developing advanced
computational technologies to support complex decision making in
dynamic environments. Our current research focuses on developing a
comprehensive decision making framework that: 1) supports adaptive
reasoning in changing, uncertain environments; 2) integrates and learns
from distributed information sources; and 3) provides timely decision
recommendations with limited resources.
This framework includes: 1) languages for effective specification of
decision factors and objectives in a context-sensitive manner, 2)
methodologies for reasoning with and learning of decision information
from human experts and online databases, and 3) techniques for adapting
responses and managing surprises under resource constraints.
Our on-going research projects focus on techniques and systems that
provide support for complex medical decisions where relevant
information from distributed multimodal sources is integrated to
provide recommendations in a timely manner. Our recent foci are:
Decision Modelling with
Multimodal Information: Projects in this area aim to
combine multimodal information, such as text-based, structured, and/or
image data, in support of biomedical decision making. The objective of
an initial project is to develop an intelligent human organ
segmentation system using 3D medical magnetic resonance images (MRIs)
to support medical decision making. We have proposed hybrid image
processing algorithms, and evaluated them on a set of kidney images. We
continue to explore and develop other image processing algorithms and
decision support system models.
Advanced Techniques in
Probabilistic Graphical Models: Projects in this area
explore and develop analytical techniques in the realm of probabilistic
graphical models and influence diagrams. We are currently working on
context-aware probabilistic reasoning, multiple-level probabilistic
game representation and knowledge discovery using Bayesian learning.
The results are being evaluated in selected prototype applications in
biomedicine and e-commerce.
Time Critical Decision
Modelling: Projects in this area investigate methodologies
and build computer-based tools for managing complex decisions under
limited resources. Such techniques take into account the dynamic nature
of the problem, uncertainties, preferences of decision makers, as well
as the time criticality of the problem. They help ensure that the
decision models being built are of optimum size for timely
recommendations of effective actions. Our ongoing work includes outcome
and risk profile analysis, guideline implementation, and learning from
imbalanced data in various critical care domains.
Highlights of research achievements and potential for commercialisation:
Project ResEasy:
Funded by the Infocomm Development Authority (IDA) and The Enterprise
Challenge (TEC) in Singapore, this translational research project
adopts a new collaborative approach by facilitating a research team, an
engineering team, and a clinical team to speed up trial implementation
and productisation of our previous research results. The objective is
to trial the effectiveness and feasibility of an open, adaptive
workbench which implements decision support applications based on a set
of generic information Image Mining Advances in image acquisition and
storage have given rise to huge image databases. Retrieving salient
information from these images is a daunting task. Image mining aims to
address this problem. Image mining research at the School of Computing
stems from our group’s original interests in data mining.
Although it is usually used in relation to the analysis of data, data
mining, like artificial intelligence, is an umbrella term and is used
with varied meaning in a wide range of contexts. Thus, in the context
of images, image mining deals with the extraction of knowledge, image
data relationship, or other patterns not explicitly stored in the
images. It is an interdisciplinary effort that draws on expertise in
image processing, information retrieval, data mining, machine learning,
database and artificial intelligence. management toolboxes that support
integration, visualisation, analysis, security and reporting of
relevant information. The toolboxes can be generalised to other
diseases or conditions directly, and adapted and deployed in multiple
sites simultaneously. The ResEasy project initially focuses on
facilitating best practices in process management, outcome analysis,
and guideline execution in two areas: prospective care of asthma
patients, and acute care of Acute Respiratory Distress Syndrome (ARDS)
patients. Collaborators and partners include the Singapore National
Asthma Program, Gleneagles Hospital, Hewlett Packard and other
engineering firms.
Image Mining
Advances in image acquisition and storage have given rise to
huge image databases. Retrieving salient information from these images
is a daunting task. Image mining aims to address this problem. Image
mining research at the School of Computing stems from our
group’s original interests in data mining. Although it is
usually used in relation to the analysis of data, data mining, like
artificial intelligence, is an umbrella term and is used with varied
meaning in a wide range of contexts. Thus, in the context of images,
image mining deals with the extraction of knowledge, image data
relationship, or other patterns not explicitly stored in the images. It
is an interdisciplinary effort that draws on expertise in image
processing, information retrieval, data mining, machine learning,
database and artificial intelligence.
The image mining research that is being carried out in the
School of Computing focuses on medical images, particularly retina
images and brain CT scan images. The main objective is to allow the
machine to capture salient features in such images with the view to
mine useful information pertaining to medical anomalies.
Retina images provide a window into what is happening inside
the human body. Subtle changes in the eye’s retinal vessels
can serve as warnings as to whether the patient may be heading towards
a stroke. Computers can help trace and track these vessels accurately
and quantify the changes in them over time. The RETINA image mining
group in the School has been working on retina images over the years
and has developed a computer aided screening system for use in
polyclinics.
A new image mining application involving brain CT scan images has
recently been studied. The current research aims to investigate
techniques for fast retrieval of brain CT scan images based on the
image content of medical anomalies as well as other textual information
associated with the medical conditions. Machine learning paradigms are
explored to enable automatic classification of medical images based on
image contents and textual data. In addition, text and image mining
techniques are also investigated for the training of the machine to do
automatic interpretation of image contents.
In the Retina image mining project, we have developed increasingly
accurate and robust algorithms to grade the vascular or blood-vessel
structure in retinal images automatically. Our approach incorporates
techniques from wavelet analysis, texture analysis, and curvature
ridge/trench analysis to attain the desired clinical sensitivity. In
collaboration with the Department of Ophthalmology and the Singapore
Eye Research Institute (SERI), we have developed a user-friendly system
called SIVA (Singapore “I” Vessel Assessment) to
extract vascular structure information and derive quantitative measures
for the description of retinal vessels’ characteristics
(Figure 1). The robust system is also flexible and intuitive in
gathering feedback for enhancing the accuracy of vessel measurement. We
are currently validating the system on approximately 6,000 retinal
images from the Singapore Prospective Cohort Study, conducted by the
Department of Ophthalmology at NUS, SERI and the Singapore General
Hospital.
In addition, we have also designed new data mining techniques for the
discovery of interesting changes over time. These include: a dense
periodic pattern miner, a scalable graph miner, and a progressive
confident rule miner. In particular, the progressive rule miner is able
to look for rules that capture the state changes of objects leading to
a certain end state with increasing confidence. An initial application
of this algorithm on the retinal dataset shows that the algorithm is
able to increase the predictive accuracy of occurrences of maculapathy
in diabetic patients.

Figure 1: The SIVA (Singapore “I” Vessel
Assessment) System
Document Analysis
Document analysis is the task of examining document content in
order to acquire an understanding of its intended meaning. The document
content can be in the form of texts, tables, charts, graphics and
photographs. Documents may be in electronic texts or digitised images.
Electronic texts are pervasive in the information world today and they
can be easily processed by the machine, but digitised images are also
becoming very popular following recent advances in digital publishing
technology. While historically, works in text processing and document
image analysis were done quite independently of each other, in recent
years, there have been common interests between the two communities:
The field of information retrieval which has traditionally dealt with
text has since also been looking into information in document images.
On the other hand, we begin to see works on web and text documents
reported in conferences that traditionally dealt with document images.
The document analysis group in the School of Computing has
interest in both electronic texts and images, with the objectives of
retrieving relevant documents and extracting textual contents from the
documents. We have developed techniques in processing documents across
different formats, including pure texts, charts, texts converted from
optical character recognition (OCR) with errors, text images with noise
and distortion, and multilingual documents.
Text representation is the task of transforming the content of
a textual document into a compact representation of its content so that
the document may be recognised and classified by a computer. In this
research, we have developed a new term weighting scheme based on a
Relevance Frequency measure for text classification.
In the area of document image analysis, we have developed
techniques to correct distorted images caused by perspective distortion
or warped document surfaces. Two main approaches have been
investigated. One is based on the textual content of the document.
Here, a curving grid is superimposed over the document in alignment
with the distorted text line. The grid is then regularised, and the
textual content straightened in the process. The other approach works
by shading information in the document page to model the 3D surface of
the document. The 3D model is then subject to a transformation process
to achieve a flattened rendition of the document page (Figure 2). The
distortion correction improves text recognition and hence provides for
more accurate document information retrieval and extraction.
Another stream of research in document image analysis focuses
on identifying the language and script of document content. The
identification is based on some statistical measures of the image
features of characters and scripts. The language/ script identification
is useful in optical character recognition (OCR) involving multilingual
documents. The automated language/script identification allows
documents to be directed to the respective OCR engines for correct text
conversion.
In addition to document text processing, recognition of charts
is also being examined in our group. The techniques developed allow us
to extract textual information as well as graphical components in
charts so that we may derive meaningful interpretation of the charts. A
question and answering system is being developed that allows chart
information to be incorporated into document textual information to
answer questions that involve data implicitly represented in charts.
Highlights of research achievements and potential for commercialisation:
Our technique for text classification based on Relevance Frequency has
attracted industry interest. A start-up company is keen to use the
technique to perform machine classification of biomedical literature
from the PubMed document database.
Our language/script identification technique has been used by a company
to develop a document processing system for a government organisation
that deals with a large amount of incoming multilingual documents.
Another company is currently exploring the use of our document image
analysis techniques to process scanned images of U.S. patent documents.
The processing tasks include correction of document skews and
distortion, detection and recognition of graphical components, and
retrieval of relevant patent documents.
Our text classification method based on Relevance Frequency has
achieved the best performance in BioCreative II evaluation (Critical
Assessment for Information Extraction in Biology)
in the proteinprotein Interaction Article Selection sub-task, in terms
of F Score. The evaluation was held in conjunction with a workshop
known as “Second
BioCreAtIvE Challenge Workshop”.
Our group has clinched a prestigious grant – the HP Digital
Publishing for University Teaching and Learning grant worth more than
US$58,000. The grant is targeted at research activities in developing
advanced techniques for document text recognition, storage and
retrieval. Only 14 institutions from around the world were selected in
2006 to receive the grant.

Figure 2. (a) Original distorted images; (b) Real shading; (c)
Reconstructed shape; (d) Uniform sampled mesh; (e) Textured mesh; (f)
Restored image
Digital Libraries
Digital libraries aim to transform the way knowledge is
created, transmitted and stored. It is a diverse area with contributors
from library sciences, databases, natural language processing,
multimedia and information retrieval. Previous efforts in the area have
examined the problems of storage and access of vast quantities of human
knowledge. While scalability of the storage and retrieval of large data
remains a problem, recent efforts are centred on how today’s
knowledge workers use and most efficiently access information. Properly
integrating information from the World Wide Web into peer-reviewed,
manually-selected and authoritative sources available in the digital
library is a continued focus.
Our current research applies techniques in machine learning
and natural language processing to solve problems in digital libraries
and applied information retrieval. While digital library research is
very diverse, our focus is in building tools and platforms for the
automated, large-scale digital library. We examine the problem of name
authority and attribution (e.g., Which of the 11 known Wei
Wang’s is the author of this work?) and large scale scholarly
digital library implementation and fielding. Query analysis is another
continuing area of interest, where we bring statistical and syntactic
analyses to bear on user queries in the context of library catalogues
as well as the Web. Our research on user interface design in the
context of analysis of search and browsing interfaces also helps us
develop and understand next-generation methods of information
presentation (e.g., using Web 2.0 technologies).
Highlights of research achievements and potential for
commercialisation:
Our holistic view of digital library research leads directly
to implementation of toolkits and fielded digital library technology.
Our research on automated backend implementation in record linkage and
terminology handling helps us to conduct more meaningful analysis of
scholarly data. Our research on query analysis and user interface
design allows us to build real-world, simple and workable interfaces
for key application areas in digital libraries such as public access
catalogues (Figure 3).

Figure 3: Re-designed prototype library catalogue featuring
tabs and overview+detail user interface design
Our current implementation efforts aim to build scholarly libraries for
mathematics research, where problems with terminology and equation
retrieval arise, and for scholarly presentations, in which alignment
and synthetic visual images play a significant role.
Our research has led to two invention disclosures that have attracted
industry attention. Our research on synthetic image classification has
led to licensing talks with international document processing companies.
While our digital library group is young, we have already established
our specialty of cross-disciplinary research. Our publications feature
automated analysis of digital data in various modes: web data, image,
and query and metadata analysis. We have also established and headed a
working group of investigators from several international universities
examining issues in scholarly digital anthologies.
The faculty members involved in artificial
intelligence research are:
- HSU Wynne
- JAIN Sanjay
- KAN Min Yen
- LEE Mong Li
- LEE Wee Sun
- LEONG Tze Yun
- STEPHAN Frank
- TAN Chew Lim
|