Controlled Experiment system for PCL Developed by Ngo Thanh Son ==================================== The system allows users to run PCL with different parameters and summarize the results. The package consits of 3 programs: _ Set up an experiment: + expt.txt: specify attributes of the experiment param: parameter file outputdir: output directory datadir: dataset directory program: program for PCL (in this version we use facade.exe) + parameter file (ex: param5.txt) PCL support 10 parameters: input: data input (data files must be in the same directory) threshold: support threshold of pattern miner run: how many percent of dataset used for testing delta: for emerging patterns (refer to [1]), absolute value 0 means jumping emerging patterns. missing_train: percentage of training data randomly removed (0 means we use original data, 10 means 10 percent will be removed) missing_test: percentage of training data randomly removed miner: pattern miner program (in this version we use fpmax) ktop: top k patterns used for scoring type: type of patterns used to construct rules gen: generators cls: close patterns dgen: generators, no patterns are subset of another egen: generators, one pattern from each equivalence class Different values for one parameters seperated by a space. + Theshold file (threshold.txt): contains support theshold for each input file. How to add one experiment into the system: exptadd [experiment file] [experiment name] For example: exptadd expt.txt pcl0 _ Run an experiment exptrun [experiment name] For example: exptrun pcl0 The system will run the algorithm with all combinations of parameters specified in the parameter file. Each experiment is associated with one parameter file. _ Analyze the experiment We can analyze the following outputs: accuracy, running time, true positive, false positive, false negative, true negative,. analyze [result file] [experiment name] [parameter to evaluate] [output to evaluate] [param1] [value of param1] ... [paramN] [value of paramN] [output to evaluate]: 0 - accuracy, 1 - running time, 2 - true positive, 3 - false positive, 4 - false negative, 5- true negative. The program will combine results and output all results of the experiment matched with that restriction. For ex: given the following parameter file input: mushroom.dat iris.dat threshold: 0.01 run: 10 10 10 10 10 10 10 10 10 10 delta: 0 missing_train: 0 0.1 0.2 0.3 0.4 0.5 missing_test: 0 0.1 0.2 0.3 miner: fpmax ktop: 10 type: dgen xxx: 1 Command 1: analyze result0 pcl0 missing_train 0 -a input delta 0 missing_test 0 miner fpmax ktop 10 type dgen xxx 1 "-a input" means average of all input data is computed. output file: result0.tmp graph file: result0.plot Graphs can be seen in results.png bu executing "gnuplot result0.plot" in Unix environment. output file will consist of 6 line correspond to different values of missing_train (0 to 50%). Command 2: analyze result0 pcl0 missing_train 0 delta 0 missing_test 0 miner fpmax ktop 10 type dgen xxx 1 output file will consist of 6 line corresponding to different values of missing_train (0 to 50%) and 2 columns corresponding to 2 values of input. Accuracy is evaluated. [1] Jinyan Li, Guimei Liu, Limsoon Wong. Mining Statistically Important Equivalence Classes and Delta-Discriminative Emerging Patterns. Proceedings of 13th International Conference on Knowledge Discovery and Data Mining, pages 430--439, San Jose, California, 12-15 August 2007.