Recovering missing proteins based on biological complexes

Participants: Wilson Wen Bin Goh, Weijia Kong, Hui Peng, Limsoon Wong


Advancements in proteomics are important to biological and clinical research because assaying protein identities/quantities paints an immediate picture of the underlying molecular landscape. However, proteomics still suffers from incomplete proteome coverage issues (i.e. not all proteins in a sample are observable in a single screen.) This gives rise to the "missing-protein problem" (MPP), which we define as difficulty in observing proteins in a proteome screen given that they are expected to be present. Due to MPP, efforts to extend proteome profiling in comparative or clinical studies, which require consistent protein/peptide detection (and accurate quantification across an extensive dynamic range), are rendered less effective.

MPP, as defined here, is different from the missing proteins defined by the Human Proteome Project (HPP). HPP missing proteins are proteins that have never been observed in the human proteome but are predicted to be present in the human proteome due to the presence of a functional gene sequence (e.g. based on genome assembly or partial transcriptome evidence); i.e. the HPP notion of missing proteins is not sample specific. Whereas MPP is focused on proteins that are present in a sample but not detected in a proteome screen on that sample. That is, MPP is about filling in "holes" in the proteomic profiling data of patients.

Current imputation methods (for filling holes in proteomic profiling data) rely on a large set of samples to estimate the correlation between the abundances of two or more proteins from the entire complement of reported proteins in these samples. They are inapplicable when there are too few samples. Moreover, some recent studies on imputation methods have shown they do not perform well on proteomic profiling data even when given a non-trivial number of samples.


The three main goals of this project are:

Selected Publications


Selected Presentations


This project is supported in part by MOE Tier-2 grant MOE2019-T2-1-042.

Last updated: 20/2/2023, Limsoon Wong.