====================================
List of Research Projects
Leong Hon Wai, RAS-Group, SoC, NUS
====================================
Contact: leonghw at comp.nus.edu.sg
url: http://www.comp.nus.edu.sg/~leonghw/
We do both fundamental and applied research in combinatorial optimization.
The application areas are in computational biology, transportation logistics,
resource allocation and scheduling, and other optimization problems.
We look for students who are algorithmically/mathematically inclined
and who can do good software development.
We have a very strong team culture and look for students who are team players.
This list contains only *some* of the projects. There are
many more interesting projects of a similar nature.
Email me (leonghw@comp.nus.edu.sg) if interested.
Title:
------
Randomized Algorithms for Problems from Computational Biology (CB)
Short Description:
------------------
Many algorithmic/optimization problems in computation biology have
inherent error in the input data or in the problem interpretation.
Traditional algorithms do not handle these error naturally.
In this research, we consider the use of randomized algorithms
for solving algorithmic problems in computational biology
(such as [1],[2]) since randomization may be one way of dealing
with inherent error. Initial candidate problems include
phylogenetic tree constructions [1], motif, pattern finding
in DNA sequences, and PPI network analysis.
We are looking for students who are strong in design and analysis
of algorithms, especially of randomized algorithms.
(No prior computational biology knowledge is required.)
References:
[1] Seung-Jin Sul and Tiffani L. Williams, "A Randomized Algorithm
for Comparing Sets of Phylogenetic Trees," Asia-Pacific
Bioinformatics Conference (APBC'07), pp. 121- 130,2007.
[2] Shuai Cheng Li, Dongbo Bu, Jinbo Xu and Ming Li
"Finding Largest Well-Predicted Subset of Protein Structure Models"
LNCS-5029, (2008), pp. 44-55, DOI: 10.1007/978-3-540-69068-9_7
Number of RS needed: 2
Title:
------
Algorithms for peptide sequencing via tandem mass spectrometry (computational biology)
Short Description:
------------------
Peptide sequencing is the problem of identification of proteins (determining
the sequence of a protein) and recent technological advances in tandem mass
spectrometry has made it the method of choice for high-throughput
identification of proteins.
(See [1], [2], [3] and others for quick introduction.)
Recently, we initiated a project on multi-charge peptide sequencing
(MCPS) that focusses on mass spectra with multiple charge (> 2).
We showed in [1] that significant performance gain can be achieved
by considered multi-charge peaks during the peptide sequencing process.
There are several possible PhD projects in this area:
(1) One project deals with improved algorithms for de novo sequencing
for multi-charge mass spectra.
(2) Another project deals with more precise peak annotation in
multi-charged mass spectra.
(3) Another project deals with the important problem of detection of post
translation modifications (PTMs). This problem has important implications
in the study of translational medicine. A collaborative project with an
overseas partner is currently being pursued for this project.
We are looking for students who are strong in design of algorithms and mathematical
analysis. Currently, two PhD students are working in this area in my research group.
This is joint research with Prof. Pevzner at UCSD and Prof Haixu Tang at
Indiana University.
References:
[1] KetFah Chong, Kang Ning and Hon Wai Leong, "Characterization of Multi Charge Mass
Spectra for Peptide Sequencing", APBC-2006, (2006), pp. 109-119.
[2] Vineet Bafna, Nathan Edwards: "On de novo interpretation of tandem mass spectra
for peptide identification.", RECOMB 2003: 9-18
[3] Ari Frank and P.A. Pevzner, "De Novo Peptide Sequencing via Probabilistic
Network Modelling." Anal. Chem., 77, pp. 964-973, 2005.
Number of RS needed: 2
Title:
------
Research and Development in PGO (Phylogeny from Gene Orders)
Short Description:
------------------
Phylogeny from Gene Orders (PGO) [1] is a software system for constructing
and comparing phylogenetic trees build using different techniques and
using different evolutionary distances on the same set of gene orders (genomes).
Our PGO system allows researchers to compare different classes of
algorithms for building phylogenetic trees. It also allows researchers
to compare the phylogenetic trees build from different evolutionary distances. P
GO integrates a number of software packages related to phylogenetic
reconstruction from gene order and gene content of genomes.
Our system can be access as a web service where users can upload their
gene order data and select the set of programs they wish to run on their data.
A similar web service that analyzes DNA sequences (instead of gene orders)
is given in [2].
In this project, we perform R&D on enhancements to PGO. Some possible
enhancements includes implementation and integration of more distance functions,
performing comparative study of different distance functions, evaluating
the practicality of DCR operations as a proxy to actual evolutionary distances,
and design and implementation of algorithms for new distance measures.
We are looking for students who are strong in algorithm design and implementation.
(No prior computational biology knowledge is required.)
References:
[1] M. Zhang, F. Hao, and H. W. Leong. Phylogeny from Gene Order (PGO):
an integrated system for comparing phylogenetic reconstruction algorithms,
International Conference on Genome Informatics, Dec 2010.
[2] A. Dereeper, V. Guignon, G. Blanc, S. Audic, S. Buffet, F. Chevenet,
J. Dufayard, S. Guindon, V. Lefort, M. Lescot, et al.
"Phylogeny. fr: robust phylogenetic analysis for the non-specialist."
Nucleic Acids Research, 2008.
Number of RS needed: 1
Title:
------
Algorithms for the Genome Sorting Problem
Short Description:
------------------
In genome sorting, we are given two genomes (given by their gene sequences
or gene order), and we want to find the minimum sequence of operations
that transform one genome to the other. Different variations ([1], [2])
of the genome sorting problem arise from the different sets of evolutionary
operations allowed: including insertion, deletion, reversals, translocation,
transposition, block interchange, fusion, fission, segmental duplication,
chromosome duplication and chromosome deletion.
An ongoing project, GSB (Genome Sorting with Bridges) considers the
genome sorting problem in which we allow all known traditional operations
(see list given above). By making no assumptions on the two genomes and
allowing all known traditional operations, we hope to make it more convenient
to compute evolutionary distances and we also hope that the evolutionary
distances computed are closer to the real distances. We have devised a new
algorithm called GSB (Genome Sorting with Bridges) that combines existing
techniques with new innovative ideas called T-bridges and X-bridges.
This project will continue this work and will continue implementation
of the GSB algorithm and related extensions.
We are looking for students who are strong in algorithm design and
implementation. (No prior computational biology knowledge is required.)
References:
[1] M. Bader. Genome rearrangements with duplications. BMC Bioinformatics,
11(Suppl 1):S27, 2010.
[2] F. Hao, J. Luan, and D. Zhu. Translocation-Deletions Distance Formula
for Sorting Genomes. WRI World Congress on Computer Science and
Information Engineering, 2009
Number of RS needed: 1
Title:
------
Fragment Assembly using very short fragments (Computational Biology)
Short Description:
------------------
Fragment assembly is an important genome sequencing problem in computational
biology. A highly successful approach to fragment assembly is recently
proposed by Pevzner et al in [1] and [2].
This project deals with consider fragment assembly using very short fragments
(about 100bp each) produced by new fragment generation technology. For these
short fragments, frequent repeats in DNA sequences causes problems in the assembly.
In this project, we aim to seek efficient solution to this problem.
We are looking for students who are strong in design of algorithms and
mathematical analysis. Currently, two students are working in related area
in my research group.
This project is a joint project with Prof Haixu Tang in Indiana University.
References:
[1] Pevzner PA, Tang H, Waterman MS., "An Eulerian path approach to DNA
fragment assembly." Proc Natl Acad Sci, USA 2001 Aug 14;98(17):9748-53.
[2] Pevzner PA, Tang H, "Fragment Assembly with Double-barreled Data,"
BioInformatics, Vol 17, Supp 1, (2001), pp. S225-S233.
Number of RS needed: 1
Title:
------
Rearrangements in genomes with unequal content (computational biology)
Short Description:
------------------
Whole-genome rearrangement studies are typically separated into two steps:
(i) identification of large blocks of sequence shared by the set of genomes;
and (ii) comparison of the respective arrangements of these blocks.
When studying many very different genomes simultanously, it becomes
difficult to identify large blocks in step (i) simply because there are
limited number of elements common to all genomes. In other words, many
of the similarities that are identified by pairwise comparisons are
dropped in step (i) when we restrict to elements common to all genomes.
In the past, we have worked on an algorithm to compare the respective
arrangements of blocks in multiple genomes (the program is called MGR,
Bourque and Pevzner 2002). The algorithm is a heursitic that seeks the
most parsimonious rearrangement scenario that best explains the observed
arrangements. MGR relies on a polynomial time algorithm to compute the
pairwise distance between 2 multichromosomal genomes (Pevzner and
Hannenhalli 1995, Tesler 2002).
Instead of working on a set of blocks common to all genomes, we would
like to adapt the algorithm to work on the different sets of pairwise
blocks. Given that MGR already only relies on pairwise comparisons to
identify rearrangements implies that this modification is very accessible.
The main challenge is to describe how these pairwise blocks are modified
by the rearrangements and how to deal with boundary ambiguities since
a rearrangement is defined for different sets of blocks on the same
genome. One approach would be to add virtual markers to account for
missing markers in some of the genomes.
Not restricting to common blocks will retain much more information
and we expect the recovered scenarios to be more accurate. It will
also open new applications to include more genomes, distant genomes
and genomes with low quality map or sequence. These developments
would be useful not only in conjunction with MGR but also for any
other multiple genome rearrangement study.
Number of RS needed: 1
Title:
------
Algorithms for Phylogenetic Tree Reconstructions (in Computational Biology)
Short Description:
------------------
A phylogenetic tree (or evolutionary tree) is a tree
representation of the evolutionary history of a set of species.
Over the past decade, there has been intensive reesarch in algorithms
for reconstructing phylogenetic tree. A number of different techniques
have been used.
In this project, we aim to design efficient algorithms for phylogenetic tree
reconstruction. The project starts with an up-to-date survey of existing
techniques for phylogenetic tree reconstruction and a comparative study of
their relative strengths and weaknesses.
Number of RS: 1
For other related research topics in computational biology, please come
by to see me in S16 06-01.
(For more projects, see http://www.comp.nus.edu.sg/~leonghw,
---> under Research Project)
Title:
------
Efficient Algorithms for Berth Allocation Planning Problems
Short Description:
------------------
The complex operations involved in the management and operation
of a world-class container transshipment port in Singapore are
complex and involves many different resources and systems.
In this project, we undertake to study the problem of planning
of some of these operations to improve their effectiveness.
Initial study with the berth allocation planning system (BAPS)
has produced interesting research problems and results.
Some examples of these are the berth partitioning problem,
the berth assignment problems.
This project aims to extend the current BAPS research with
the study of more efficient algorithms for solving these and
other related optimization problems in container port operation.
Number of RS: 1
Title:
------
Multi-Agent Approach to Combinatorial Optimization (2 students)
Short Description:
------------------
The majority of past approaches to combinatorial optimization
use a centralized decision making framework where the algorithm
is aware of all relevant problem parameters.
In this project, we study the an interesting new approach
to combinatorial optimization -- the multi-agent approach
in which a distributed decision making framework is employed.
Such a distributed framework is more flexible and is able to
incorporate new changes and new constraints in the dynamic business
environment of today.
In our multi-agent approach, we model the entities in the problem
using software agents, each of which has only a restricted
picture of the entire state of the system.
Thus, the agents cooperate as well as compete for scarce resources
while optimizing some global objective functions. The multi-agent
approach has been successfully applied to two logistics problems --
the vehicle routing problem and the inventory routing problem.
In this project, we develop a multi-agent algorithm for solving
a scheduling problem that involves multiple entities and scarce, shared
resources. The challenge will be to appropriate model the various
entities using appropriate agents with associated properties, believes
and operations. We then explore issues such as the influence of local
information, and the appropriate distributed decision making framework.
This project will also involve the implementation of the multi-agent
approach and benchmarking against the state-of-the-art approaches.
Two starting reference:
Yizhi Lao, and Hon Wai Leong, "A Multi-Agent Based Approach
to the Inventory Routing Problem", Pacific-Rim Intl. Conf. on
Artificial Intelligence (PRICAI-02), (2002), LNAI-2417, pp 345-354.
Hon Wai Leong and Ming Liu, "A Multi-Agent Algorithm for
Vehicle Routing Problem with Time Windows", ACM Symposium on
Applied Computing (SAC-2006), AIMS Track, (2006), pp. 106-111.
Number of RS: 1
For other related research topics in logistics, please come
by to see me in S16 06-01.
(For more projects, see http://www.comp.nus.edu.sg/~leonghw,
---> under Research Project)
Research Scholars Currently Supervised:
---------------------------------------
Ng Hoong Kee, PhD (Jan 2000 -- ) email: nghoong
Topic: Multi-Point Range Query: Algorithms and Applications
Chong Ket Fah, PhD (Jan 2002 -- ) email: chongket
Topic: Computational Proteomics
Ning Kang, PhD (Jul 2002 -- ) email: ningkang
Topic: Computational Proteomics, SCS related problems
Recently Graduated Students:
----------------------------
Tan Jing Song, MSc (Jul 2002 -- Jul 2003) (now doing PhD at U-Penn)
Topic: Efficient Algorithms for Dynamic Route Advisory Problems
Li Shuai Cheng, MSc (July 2001 -- Jul 2002) (now doing PhD at Waterloo)
Topic: Efficient Algoritms for the Berth Assignment Problem
Lao Yizhi, MSc (July 2000 -- July 2001)
Topic: A Multi-Agent Based Approach to the Inventory Routing Problem.
Ong Tat Wee, MSc (2000)
Topic: A Graph Partitioning Algorithm for the Berth Allocation Problem