Gene Ontology analysis of continuous measures based on Mann-Whitney U test with adaptive clustering of GO categories

Mikhail Matz

University of Texas at Austin, USA

Comprehensive and visually clear functional summaries of genome-wide information remain a challenge in genome biology. Gene Ontology (GO) annotations have been used for this purpose for more than 10 years, but there is still no consensus as to how to best analyze the data and present the results. Here I describe the GOMWU method that in our experience generates the most statistically powerful, informative, and visually understandable functional summaries based on GO annotations. The advantage of my method over a typical "GO enrichment" analysis (e.g., GeneMerge by Castillo-Davis, Hartl, 2003) are as follows. First, the experimenter does not have to impose an arbitrary cutoff for initial candidate gene selection, and thus the whole dataset can be used to gain information. No preliminary statistical test is required prior to the analysis. The method is best suited to analyze the distribution of continuous measures, such as dN/dS values, fold-changes of gene expression, or raw p-values (unadjusted for multiple comparisons). It works particularly well for kME values obtained by the weighted gene coexpression network analysis (WGCNA). The second advantage is that the method pre-summarizes the GO hierarchy by clustering GO categories based on gene sharing within the dataset being analyzed. This generates biologically meaningful grouping of GO categories tailored for the particular dataset and allows the analysis to be more specific (i.e., involve lower GO hierarchy levels) than in most other GO analysis methods. Third, the visual representation of the results is compact and intuitive.