==================================== List of Research Projects Leong Hon Wai, RAS-Group, SoC, NUS ==================================== Contact: leonghw at comp.nus.edu.sg url: http://www.comp.nus.edu.sg/~leonghw/ We do both fundamental and applied research in combinatorial optimization. The application areas are in computational biology, transportation logistics, resource allocation and scheduling, and other optimization problems. We look for students who are algorithmically/mathematically inclined and who can do good software development. We have a very strong team culture and look for students who are team players. This list contains only *some* of the projects. There are many more interesting projects of a similar nature. Email me (leonghw@comp.nus.edu.sg) if interested. Title: ------ Randomized Algorithms for Problems from Computational Biology (CB) Short Description: ------------------ Many algorithmic/optimization problems in computation biology have inherent error in the input data or in the problem interpretation. Traditional algorithms do not handle these error naturally. In this research, we consider the use of randomized algorithms for solving algorithmic problems in computational biology (such as [1],[2]) since randomization may be one way of dealing with inherent error. Initial candidate problems include phylogenetic tree constructions [1], motif, pattern finding in DNA sequences, and PPI network analysis. We are looking for students who are strong in design and analysis of algorithms, especially of randomized algorithms. (No prior computational biology knowledge is required.) References: [1] Seung-Jin Sul and Tiffani L. Williams, "A Randomized Algorithm for Comparing Sets of Phylogenetic Trees," Asia-Pacific Bioinformatics Conference (APBC'07), pp. 121- 130,2007. [2] Shuai Cheng Li, Dongbo Bu, Jinbo Xu and Ming Li "Finding Largest Well-Predicted Subset of Protein Structure Models" LNCS-5029, (2008), pp. 44-55, DOI: 10.1007/978-3-540-69068-9_7 Number of RS needed: 2 Title: ------ Algorithms for peptide sequencing via tandem mass spectrometry (computational biology) Short Description: ------------------ Peptide sequencing is the problem of identification of proteins (determining the sequence of a protein) and recent technological advances in tandem mass spectrometry has made it the method of choice for high-throughput identification of proteins. (See [1], [2], [3] and others for quick introduction.) Recently, we initiated a project on multi-charge peptide sequencing (MCPS) that focusses on mass spectra with multiple charge (> 2). We showed in [1] that significant performance gain can be achieved by considered multi-charge peaks during the peptide sequencing process. There are several possible PhD projects in this area: (1) One project deals with improved algorithms for de novo sequencing for multi-charge mass spectra. (2) Another project deals with more precise peak annotation in multi-charged mass spectra. (3) Another project deals with the important problem of detection of post translation modifications (PTMs). This problem has important implications in the study of translational medicine. A collaborative project with an overseas partner is currently being pursued for this project. We are looking for students who are strong in design of algorithms and mathematical analysis. Currently, two PhD students are working in this area in my research group. This is joint research with Prof. Pevzner at UCSD and Prof Haixu Tang at Indiana University. References: [1] KetFah Chong, Kang Ning and Hon Wai Leong, "Characterization of Multi Charge Mass Spectra for Peptide Sequencing", APBC-2006, (2006), pp. 109-119. [2] Vineet Bafna, Nathan Edwards: "On de novo interpretation of tandem mass spectra for peptide identification.", RECOMB 2003: 9-18 [3] Ari Frank and P.A. Pevzner, "De Novo Peptide Sequencing via Probabilistic Network Modelling." Anal. Chem., 77, pp. 964-973, 2005. Number of RS needed: 2 Title: ------ Research and Development in PGO (Phylogeny from Gene Orders) Short Description: ------------------ Phylogeny from Gene Orders (PGO) [1] is a software system for constructing and comparing phylogenetic trees build using different techniques and using different evolutionary distances on the same set of gene orders (genomes). Our PGO system allows researchers to compare different classes of algorithms for building phylogenetic trees. It also allows researchers to compare the phylogenetic trees build from different evolutionary distances. P GO integrates a number of software packages related to phylogenetic reconstruction from gene order and gene content of genomes. Our system can be access as a web service where users can upload their gene order data and select the set of programs they wish to run on their data. A similar web service that analyzes DNA sequences (instead of gene orders) is given in [2]. In this project, we perform R&D on enhancements to PGO. Some possible enhancements includes implementation and integration of more distance functions, performing comparative study of different distance functions, evaluating the practicality of DCR operations as a proxy to actual evolutionary distances, and design and implementation of algorithms for new distance measures. We are looking for students who are strong in algorithm design and implementation. (No prior computational biology knowledge is required.) References: [1] M. Zhang, F. Hao, and H. W. Leong. Phylogeny from Gene Order (PGO): an integrated system for comparing phylogenetic reconstruction algorithms, International Conference on Genome Informatics, Dec 2010. [2] A. Dereeper, V. Guignon, G. Blanc, S. Audic, S. Buffet, F. Chevenet, J. Dufayard, S. Guindon, V. Lefort, M. Lescot, et al. "Phylogeny. fr: robust phylogenetic analysis for the non-specialist." Nucleic Acids Research, 2008. Number of RS needed: 1 Title: ------ Algorithms for the Genome Sorting Problem Short Description: ------------------ In genome sorting, we are given two genomes (given by their gene sequences or gene order), and we want to find the minimum sequence of operations that transform one genome to the other. Different variations ([1], [2]) of the genome sorting problem arise from the different sets of evolutionary operations allowed: including insertion, deletion, reversals, translocation, transposition, block interchange, fusion, fission, segmental duplication, chromosome duplication and chromosome deletion. An ongoing project, GSB (Genome Sorting with Bridges) considers the genome sorting problem in which we allow all known traditional operations (see list given above). By making no assumptions on the two genomes and allowing all known traditional operations, we hope to make it more convenient to compute evolutionary distances and we also hope that the evolutionary distances computed are closer to the real distances. We have devised a new algorithm called GSB (Genome Sorting with Bridges) that combines existing techniques with new innovative ideas called T-bridges and X-bridges. This project will continue this work and will continue implementation of the GSB algorithm and related extensions. We are looking for students who are strong in algorithm design and implementation. (No prior computational biology knowledge is required.) References: [1] M. Bader. Genome rearrangements with duplications. BMC Bioinformatics, 11(Suppl 1):S27, 2010. [2] F. Hao, J. Luan, and D. Zhu. Translocation-Deletions Distance Formula for Sorting Genomes. WRI World Congress on Computer Science and Information Engineering, 2009 Number of RS needed: 1 Title: ------ Fragment Assembly using very short fragments (Computational Biology) Short Description: ------------------ Fragment assembly is an important genome sequencing problem in computational biology. A highly successful approach to fragment assembly is recently proposed by Pevzner et al in [1] and [2]. This project deals with consider fragment assembly using very short fragments (about 100bp each) produced by new fragment generation technology. For these short fragments, frequent repeats in DNA sequences causes problems in the assembly. In this project, we aim to seek efficient solution to this problem. We are looking for students who are strong in design of algorithms and mathematical analysis. Currently, two students are working in related area in my research group. This project is a joint project with Prof Haixu Tang in Indiana University. References: [1] Pevzner PA, Tang H, Waterman MS., "An Eulerian path approach to DNA fragment assembly." Proc Natl Acad Sci, USA 2001 Aug 14;98(17):9748-53. [2] Pevzner PA, Tang H, "Fragment Assembly with Double-barreled Data," BioInformatics, Vol 17, Supp 1, (2001), pp. S225-S233. Number of RS needed: 1 Title: ------ Rearrangements in genomes with unequal content (computational biology) Short Description: ------------------ Whole-genome rearrangement studies are typically separated into two steps: (i) identification of large blocks of sequence shared by the set of genomes; and (ii) comparison of the respective arrangements of these blocks. When studying many very different genomes simultanously, it becomes difficult to identify large blocks in step (i) simply because there are limited number of elements common to all genomes. In other words, many of the similarities that are identified by pairwise comparisons are dropped in step (i) when we restrict to elements common to all genomes. In the past, we have worked on an algorithm to compare the respective arrangements of blocks in multiple genomes (the program is called MGR, Bourque and Pevzner 2002). The algorithm is a heursitic that seeks the most parsimonious rearrangement scenario that best explains the observed arrangements. MGR relies on a polynomial time algorithm to compute the pairwise distance between 2 multichromosomal genomes (Pevzner and Hannenhalli 1995, Tesler 2002). Instead of working on a set of blocks common to all genomes, we would like to adapt the algorithm to work on the different sets of pairwise blocks. Given that MGR already only relies on pairwise comparisons to identify rearrangements implies that this modification is very accessible. The main challenge is to describe how these pairwise blocks are modified by the rearrangements and how to deal with boundary ambiguities since a rearrangement is defined for different sets of blocks on the same genome. One approach would be to add virtual markers to account for missing markers in some of the genomes. Not restricting to common blocks will retain much more information and we expect the recovered scenarios to be more accurate. It will also open new applications to include more genomes, distant genomes and genomes with low quality map or sequence. These developments would be useful not only in conjunction with MGR but also for any other multiple genome rearrangement study. Number of RS needed: 1 Title: ------ Algorithms for Phylogenetic Tree Reconstructions (in Computational Biology) Short Description: ------------------ A phylogenetic tree (or evolutionary tree) is a tree representation of the evolutionary history of a set of species. Over the past decade, there has been intensive reesarch in algorithms for reconstructing phylogenetic tree. A number of different techniques have been used. In this project, we aim to design efficient algorithms for phylogenetic tree reconstruction. The project starts with an up-to-date survey of existing techniques for phylogenetic tree reconstruction and a comparative study of their relative strengths and weaknesses. Number of RS: 1 For other related research topics in computational biology, please come by to see me in S16 06-01. (For more projects, see http://www.comp.nus.edu.sg/~leonghw, ---> under Research Project) Title: ------ Efficient Algorithms for Berth Allocation Planning Problems Short Description: ------------------ The complex operations involved in the management and operation of a world-class container transshipment port in Singapore are complex and involves many different resources and systems. In this project, we undertake to study the problem of planning of some of these operations to improve their effectiveness. Initial study with the berth allocation planning system (BAPS) has produced interesting research problems and results. Some examples of these are the berth partitioning problem, the berth assignment problems. This project aims to extend the current BAPS research with the study of more efficient algorithms for solving these and other related optimization problems in container port operation. Number of RS: 1 Title: ------ Multi-Agent Approach to Combinatorial Optimization (2 students) Short Description: ------------------ The majority of past approaches to combinatorial optimization use a centralized decision making framework where the algorithm is aware of all relevant problem parameters. In this project, we study the an interesting new approach to combinatorial optimization -- the multi-agent approach in which a distributed decision making framework is employed. Such a distributed framework is more flexible and is able to incorporate new changes and new constraints in the dynamic business environment of today. In our multi-agent approach, we model the entities in the problem using software agents, each of which has only a restricted picture of the entire state of the system. Thus, the agents cooperate as well as compete for scarce resources while optimizing some global objective functions. The multi-agent approach has been successfully applied to two logistics problems -- the vehicle routing problem and the inventory routing problem. In this project, we develop a multi-agent algorithm for solving a scheduling problem that involves multiple entities and scarce, shared resources. The challenge will be to appropriate model the various entities using appropriate agents with associated properties, believes and operations. We then explore issues such as the influence of local information, and the appropriate distributed decision making framework. This project will also involve the implementation of the multi-agent approach and benchmarking against the state-of-the-art approaches. Two starting reference: Yizhi Lao, and Hon Wai Leong, "A Multi-Agent Based Approach to the Inventory Routing Problem", Pacific-Rim Intl. Conf. on Artificial Intelligence (PRICAI-02), (2002), LNAI-2417, pp 345-354. Hon Wai Leong and Ming Liu, "A Multi-Agent Algorithm for Vehicle Routing Problem with Time Windows", ACM Symposium on Applied Computing (SAC-2006), AIMS Track, (2006), pp. 106-111. Number of RS: 1 For other related research topics in logistics, please come by to see me in S16 06-01. (For more projects, see http://www.comp.nus.edu.sg/~leonghw, ---> under Research Project) Research Scholars Currently Supervised: --------------------------------------- Ng Hoong Kee, PhD (Jan 2000 -- ) email: nghoong Topic: Multi-Point Range Query: Algorithms and Applications Chong Ket Fah, PhD (Jan 2002 -- ) email: chongket Topic: Computational Proteomics Ning Kang, PhD (Jul 2002 -- ) email: ningkang Topic: Computational Proteomics, SCS related problems Recently Graduated Students: ---------------------------- Tan Jing Song, MSc (Jul 2002 -- Jul 2003) (now doing PhD at U-Penn) Topic: Efficient Algorithms for Dynamic Route Advisory Problems Li Shuai Cheng, MSc (July 2001 -- Jul 2002) (now doing PhD at Waterloo) Topic: Efficient Algoritms for the Berth Assignment Problem Lao Yizhi, MSc (July 2000 -- July 2001) Topic: A Multi-Agent Based Approach to the Inventory Routing Problem. Ong Tat Wee, MSc (2000) Topic: A Graph Partitioning Algorithm for the Berth Allocation Problem