Drosophila melanogaster is one of the most important organisms for studying the genetics of development. Gene regulation during the early development of Drosophila is involves clusters of multiple transcription factor binding sites (TFBS) known as cis-regulatory modules (CRMs). A number of TFBS and CRMs have been experimentally annotated in the Drosophila genome. Recently a comprehensive collection of over 600 experimentally determined CRMs in Drosophila has been compiled in the REDfly database, and more than 1300 experimental TFBS annotations for over 80 different TFs have been compiled in the Drosophila DNase I Footprint Database. Computational CRM predictions in Drosophila have also been reported in several publications. This database combines the information available from multiple resources in a comprehensive database of CRM and TFBS annotations. Currently there are 661 CRM sequences in the database corresponding to 235 Drosophila genes. Of the 661 CRMs, only 155 are experimentally annotated with 778 TFBS. We have developed a computational method for annotating TFBS in the remaining 506 uncharacterized Drosophila CRMs. Thus a complete TFBS annotation of all 661 CRMs is available in the database.
For 85 genes, experimental annotations of 1066 TFBS for 83 known transcription factors were collected from the FlyReg Database. This is a subset of the FlyReg database, leaving out entries with unknown transcription factor or gene information. For 196 genes, a total of 619 experimentally annotated CRMs were obtained from the REDfly database. The FlyReg and REDfly databases had 52 genes in common, so that both experimentally annotated TFBS and CRMs could be obtained for these genes. Interestingly, the annotated TFBS overlapped the annotated CRM regions for all genes except one. There were thus 778 known TFBS falling within 155 known CRMs across 51 genes. These genes comprised the training set in this study since extensive annotation was available for them. For the rest 184 genes, only partial information of either TFBS or CRM annotations was available. The study of Schroeder et al. (2004) added information of 42 additional CRMs (3 experimental and 39 predicted), making the total number of available CRMs as 661. However none of these CRMs overlapped any of the known TFBS from the FlyReg database.