The modern biology researcher faces vast amounts of data produced by high-throughput analytical technologies for DNA, RNA, and proteins.​

This rich and complex mix of data is also confounded by a variety of biological and nonbiological factors, which makes it difficult, inefficient, and inaccurate to draw the right research conclusions.

By understanding and exploiting properties of the underlying biology, instruments, technologies, and experiment designs, we develop advanced methods to process and analyse these large amounts of complex, biological data – which help solve problems in biology, biotechnology, and medicine.




Develop effective and efficient algorithmic techniques used in the acquisition, storage, analysis, and dissemination of biological data.


Design innovative and elegant computational approaches to decipher the structure, function, and behaviour of cells, and explore how changes in these affect phenotypes.


Bioinformatics Algorithms

Omics Data Analysis

​Learning Theory

​Machine Learning


Protein folding and protein structure prediction


Determining structure and function of proteins is a cornerstone of modern biology and medicine. We are developing novel computational methods, built on cutting-edge AI technique and physics-based force field, to accurately model the structure and function of proteins. One goal of the study is to reveal the fundamental relationship between sequence, structure, and function of proteins.

  • Bioinformatics Algorithms

AI-based protein design and drug discovery


Proteins in nature were generated following billions of years of evolution and therefore possess limited structural folds and biological functions. This project aims to design new protein sequences with novel structure and function beyond nature proteins. The computationally designed proteins and peptides can be used as drugs to treat various human diseases such as cancer and Alzhelmer's disease.

  • Bioinformatics Algorithms

Enabling more sophisticated proteomic profile analysis

WONG Lim Soon

Quantitative comparison of samples is central to proteomics. However, biomarkers identified in one batch are quite often not consistent and not reproducible in another batch of samples. We developed techniques based on biological networks to more reproducibly and consistently identify biomarkers and achieve more reliable proteomic-based diagnosis.

From iteration on multiple collections in synchrony to fast general interval joins

WONG Lim Soon

Synchrony iterator captures a programming pattern for synchronized iterations. It is a conservative extension that enhances the repertoire of algorithms expressible in comprehension syntax. In particular, efficient general synchronized iterations, e.g. linear-time algorithms for low-selectivity database non-equijoins, become expressible naturally in comprehensinon syntax.

  • TRL 4

Dealing with confounders in omics analysis

WONG Lim Soon

Universality and reproducibility problems are commonly encountered in analyzing omics data due to etiology and human variability, but also batch effects, poor experiment design, inappropriate sample size, and misapplied statistics. Here, we explore a deeper rethink on the mechanics of applying statistical tests, and design analysis techniques that are robust on omics data.

Transcription factor interaction prediction and classification

WONG Lim Soon

Regulatory mechanisms often involve several transcription factors (TF), binding together and attaching to the DNA as a single complex. But only a fraction of the regulation partners of each TF is currently known. We developed techniques for predicting the physical interaction between TFs, as well as for predicting the nature of their interactions (i.e. co-operative, competitive, or others).