nClusters and maxnClusters
Version 1.1
==========================

Program maxnclusters first calls "nclusters" to mine nClusters, then removes 
non-maximal nClusters in a post-processing step. 

Usage: 

   maxnclusters nclusters data_filename min_row_size  min_clmn_size  delta 1 max_overlap output_filename

Parameters: 
1. data_filename: If the specified name is xxx, then two files should exist: "xxx.data" and "xxx.names".
                  Data format: (similar to the data format used by UCI machine learning repository.)

                  xxx.data: Each line represents an object with a set of attributes sperated by comma.

                  xxx.names: The first line contains the number of objects  (genes). The second line contains 
                             the number of attributes (tissues). The remaining lines contain information on 
                             whether an attribute is continuous or nominal. Each line is of the form: 
                              
                             attrbute-name: continuous/nominal/... 
 
2. min_row_size: minimum number of objects in a cluster

3. min_clmn_size: minimum number of attributes in a cluster

4. delta: the distance threshold. The objects in a cluster are at most delta times attribute-range apart from
          each other on every attribute in this cluster. 

5. max_overlap: the maximum overlap allowed between adjacent bins of an attribute. Suggested value: 0.9. 
                If the program takes too long, then try to lower this value. 

6. output_filename: contains the subspace clusters generated. Each cluster takes two lines. The first line 
		 contains the set of attributes in the cluster, and the second 
		 line contains the object ids in the bicluster. The format is as follows:

	#attributes  attr1 attr2... attrk 
	#objects obj1 obj2 ... objm

		 where #attribute is the number of attributes in the cluster, followed by 
		 the set of attributes in the cluster. #objects is the number of objects in the cluster, 
                 followed by the ids of the objects. The id of an objects is its line no. in "xxx.data" 
                 minus 1. 



Credits:

These programs were written by LIU Guimei. During the project period, she was
partially supported by 
FRC grant "R-252-040-238-101 & R-252-060-238-133: Pattern Spaces: Theory, Algorithms, and Applications", 
MOE T1 grant "R-252-000-274-112: Graph-Based Protein Function Prediction", and 
SERC PSF grant "SERC 072 101 0016 : Pattern Spaces: Theory, Techniques and Applications".


If you use these programs, please cite:

Guimei Liu, Jinyan Li, Kelvin Sin, Limsoon Wong. 
Distance-Based Subspace Clustering with Flexible Dimension Partitioning. 
Proceedings of 23rd IEEE International Conference on Data Engineering, 
pages 1250--1254, Istanbul, Turkey, April 2007. 

Guimei Liu, Kelvin Sim, Jinyan Li, Limsoon Wong. 
Efficient Mining of Distance-Based Subspace Clusters. 
Statistical Analysis and Data Mining, submitted. 


