SampleWise Enumeration Methods for Mining Microarray Data
Anthony K. H. Tung
Department of Computer Science
National University of Singapore
A Microarray Dataset
1000 100,000 columns
Challenge I
“a”however not part of the group
Challenge II
Column/Item Enumeration Lattice
Column/Item Enumeration Lattice
General Framework for Column/Item Enumeration
Readbased 
Writebased 
Pointbased 

Association Mining 
Apriori[AgSr94], DIC 
Eclat, MaxClique[Zaki01], FPGrowth[HaPe00] 
Hmine 
Sequential Pattern Discovery 
GSP[AgSr96] 
SPADE [Zaki98,Zaki01], PrefixSpan[PHPC01] 

Iceberg Cube 
Apriori[AgSr94] 
BUC[BeRa99], HCubing [HPDW01] 
A Multidimensional View
read
write
Sample/Row Enumeration Algorihtms
Existing Sample Enumeration Algorithms
Concepts of CARPENTER
Example Table
Transposed Table,TT
TT{2,3}
Row Enumeration
Pruning Method 1
r2 r3{aeh}
r4 has 100% support in the conditional table of “r2r3”, therefore branch “r2 r3r4”will be pruned.
r2 r3 r4 {aeh}
TT{2,3}
Pruning method 2
Pruning Method 3: Minimum Support
From CARPENTER to FARMER
Interesting Rule Groups
ae>C (66%)
a>C however is not in the group
Pruning by Interestingness Measure
Ordering of Rows: All Class C before ~C
Pruning Method: Minimum Confidence
TT{2,3}
Pruning method: Minimum chisquare
TT{2,3}
C 
~C 
Total 

A 
max=5 
min=1 
Computed 
~A 
Computed 
Computed 
Computed 
Constant 
Constant 
Constant 
Finding Lower Bound, MineLB
Implementation
Experimental studies
Example resultsProstate
Example resultsProstate
Naive Classification Approach
Classification results
Summary of Experiments
COBBLER: Combining Column and Row Enumeration
{ }
ab
{r1}
ad {r2}
acd{ r2}
bcd
{ }
{ }
r1r3r4 { }
r2r4{d }
Single Enumeration Tree
Feature enumeration
Row enumeration
r1 
a b c 
r2 
a c d 
r3 
b c 
r4 
d 
Dynamic Enumeration Tree
{ }
r1
{bc}
r1
{c}
r2
{d }
Feature enumerationto Row enumeration
ab
{r1}
abcd
{ }
a {r1r2}
abc: {r1}
ac: {r1r2}
acd: {r2}
r1 
bc 
r2 
cd 
br 
1 
cr 
1 r2 
dr 
2 
Dynamic Enumeration Tree
r4
{d}
c{r2r3 }
d {r4 }
b{r1 }
Row enumerationto Feature Enumeration
c{r1r2 }
bc{r1 }
r1 {abc}
r1r3r4 { }
ac: {r1r2}
bc: {r1r3}
c: {r1r2r3}
Switching Condition
Switching Condition
Switching Condition
Length and Row ratio
Synthetic data
Extension of our work by other groups (with or without citation)
Extension of our work by other groups (with or without citation) II
Gene1 
Gene2 
Gene3 
Gene4 

Sample1 

Sample2 

. . . 

SampleN1 

SampleN 
A gene in two samples are say to be coherent if their time series satisfied a certain matching condition
In CARPENTER, a gene in two samples are say to be matching if their expression in the two samples are almost the same
Extension of our work by other groups (with or without citation) IV
[2] Try to find a subset of samples S such that a subset of genes G is coherent for each pair of samples in S. S>mins, G>ming
In CARPENTER, we try to find a subset of samples S in which a subset of genes G is similar in expression level for each pair of samples in S. S>mins, G>0
Gene1 
Gene2 
Gene3 
Gene4 

S1 
1.23 

S2 
1.34 

. . . 

SN1 
1.52 

SN 
[2] Perform samplewise enumeration and remove genes that are not pairwisecoherent across the samples enumerated
CARPENTER: Perform samplewise enumeration and remove genes that does not have the same expression level across the samples enumerated
Extension of our work by other groups (with or without citation) VI
TT{3,4}
Extension of our work (Conclusion)
Future Work: Generalize Framework for Row Enumeration Algorithms?
Conclusions