FARMER: Finding Interesting Rule Groups in Microarray Datasets.

Abstract

The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows. For example, many gene expression datasets may contain up to 10,000-100,000 columns but only 100-1000 rows.
Association rules can reveal biological relevant associations between genes and environments / categories to identify gene regulation pathways. However, most existing association rule mining algorithms have an exponential dependence on the number of columns. Moreover, the number of association rules generated from biological datasets is enormous due to the combinatorial explosion of frequent itemsets. In this paper, we describe a new algorithm
called FARMER that is specially designed to discover interesting rule groups by identifying their upper bounds and lower bounds from biological datasets. FARMER exploits all user-specified constraints including minimum support, minimum confidence and minimum chi square to support efficient pruning. Several experiments on real bioinformatics datasets show that FARMER is orders of magnitude better than previous association rule mining algorithms. To further illustrate the usefulness of our discovered interesting rule groups, we show that a simple classifier built from them is able to produce good classification results compared to existing classification algorithms, such as CBA and SVM.