Dealing with Confounders in Omics Analysis

Participants: Wilson Goh, Wong Limsoon


Statistical feature selection on high-throughput omics data (e.g., genomics, proteomics, and transcriptomics) is commonly deployed to help understand the mechanism underpinning disease onset and progression. In clinical practice, these features are critical as biomarkers for diagnosis (see Glossary), guiding treatment, and prognosis. Unlike monogenic disorders, many challenging diseases (e.g., cancer) are polygenic, requiring multigenic signatures to counteract etiology and human variability issues. Unfortunately, in the course of analyzing omics data, we commonly encounter universality and reproducibility problems due to etiology and human variability, but also batch effects, poor experiment design, inappropriate sample size, and misapplied statistics.

Current literature mostly blames poor experiment design and overreliance on the highly fluctuating P-value. In this project, we explore a deeper rethink on the mechanics of applying statistical tests (e.g. hypothesis statement construction, null distribution appropriateness, and test-statistic construction), and design analysis techniques that are robust on omics data.

Selected Publications

Selected Presentations


This project is supported in part by a Kwan Im Thong Hood Cho Temple Chair Professorship, and in part by two AI Singapore grants (AISG-100E-2019-027 and AISG-100E-2019-028) and a Singapore Ministry of Education tier-2 grant (MOE2019-T2-1-042).

Last updated: 7 May 2022, Limsoon Wong.