FARMER: Finding Interesting Rule Groups in Microarray Datasets.
Abstract
The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number
of columns and a small number of rows. For example, many gene expression datasets may contain up to 10,000-100,000 columns but
only 100-1000 rows.
Association rules can reveal biological relevant associations between genes and environments / categories to identify gene
regulation pathways. However, most existing association rule mining algorithms have an exponential dependence on the number of
columns. Moreover, the number of association rules generated from biological datasets is enormous due to the combinatorial explosion
of frequent itemsets. In this paper, we describe a new algorithm
called FARMER that is specially designed to discover interesting rule groups by identifying their upper bounds and lower bounds
from biological datasets. FARMER exploits all user-specified constraints including minimum support, minimum confidence and
minimum chi square to support efficient pruning. Several experiments on real bioinformatics datasets show that
FARMER is orders of magnitude better than previous association rule mining algorithms. To further illustrate the usefulness of our discovered
interesting rule groups, we show that a simple classifier built from them is able to produce good classification results compared
to existing classification algorithms, such as CBA and SVM.