Exploratory Hypothesis Testing and Analysis

Participants: Andre Suchitra, Haojun Zhang, Wei Zhong Toh, Mengling Feng, Guimei Liu, Limsoon Wong.

Click here for a non-technical ppt of the project.


More and more data have been accumulated and stored in digital format in various applications. These data provide rich sources for making new discoveries. Data mining has become an important tool to transform data into knowledge. Finding useful and actionable knowledge is the main objective of diagnostic data mining. Most existing works tackle the problem by discovering patterns and rules and then studying their interestingness; see our past project on pattern spaces. In this work, we use a different paradigm which represents the discovered knowledge in the form of hypotheses. A hypothesis involves a comparison of two or more samples, which is more or less similar to how human obtain knowledge. Compared with patterns and rules, hypotheses provide the context in which a piece of information is interesting, thus hypotheses are more intuitive and informative than patterns and rules. More importantly, users can take actions more easily based on what a hypothesis indicates. We further analyse the discovered significant hypotheses and identify the reasons behind them so that users not only get to know what is happening but also have some rough ideas on when or why it is happening. This new data mining paradigm has the potential to make diagnostic data mining as successful as predictive data mining in real-life applications. In the proposed research, we will (1) formulate the problem and identify the issues that need to be addressed; (2) develop algorithms to solve the problem; (3) visualize the discovered knowledge to make the system easy to use; (4) interact and cooperate with domain experts in the biomedical area or other areas, and use the developed techniques to solve real-life problems.


The main objective of the proposed work is to build a diagnostic data mining system that can:

Thus the scope of this project includes:

Main Results

Selected Publications


Selected Presentations


This project is supported in part by a A*STAR PSF grant (SERC 102 101 0030, 1/8/2010 - 31/7/2013) and a MOE T2 grant (MOE2012-T2-1-061, 12/10/2012 - 11/10/2015).

Last updated: 11/7/2017, Limsoon Wong.