Supplementary Information for the paper "Recognition of Polyadenylation Sites from Arabidopsis Genomic Sequences" (Chuan Hock Koh, Limsoon Wong. Proceedings of 18th International Conference on Genome Informatics (GIW), Singapore, December 2007). GIW2007-v1.zip is the original version as released during GIW2007 (Dec 2007). GIW2007-v2.zip is a revised version, where FASTA-formatted data are included for more convenient exploration by other systems that require FASTA input data (June 2008). ============================================================================ Brief Summary of the Java files and Classes 1) Run.java - Contains the static void main function 2) RawToArff.java - Generate Arff format files from Raw format files Transform Class: change all Dataset B Raw Format files into Arff Format files and change all Dataset E Raw Format files into Arff Format files Derive Class: change all Dataset C Raw Format files into Arff Format files and change all Dataset D Raw Format files into Arff Format files 3) SMO1.java - Do run with -Xmx1200m to prevent OutOfMemoryError Cascade Class: Train SMO 1 using Dataset B and Run trained SMO 1 on Dataset C and Dataset D AnalyseSMO1 Class: Get the SN (Sensitivity) and SP (Specificity) of SMO1 based on Dataset D PrepareSMO2Input Class: Prepare the training and test data for SMO 2 4) SMO2.java - Do run with -Xmx1024m to prevent OutOfMemoryError Cascade2 Class: Train SMO 2 using Dataset C and Run trained SMO 2 on Dataset D AnalyseSMO2 Class: Get the SN (Sensitivity) and SP (Specificity) of SMO2 based on Dataset D 5) SMOA.java - Do run with -Xmx1200m to prevent OutOfMemoryError Cascade Class: Train SMO A using Dataset E and Run trained SMO A on Dataset D AnalyseSMO1 Class: Get the SN (Sensitivity) and SP (Specificity) of SMO1 based on Dataset D DONE!