Participants: Vladimir Bajic, Rajesh Chowdhary Chuan Hock Koh, Huiqing Liu, Limsoon Wong, Roland Yap, Fanfan Zeng


Correct prediction of transcription start sites, translation initiation sites, gene splice sites, poly-A sites, and other functional sites from DNA sequences are important issues in genomic research. In this project, we investigate these prediction problems using the paradigm of ``feature generation, feature selection, and feature integration''. There are two reasons for our interest in such a paradigm. The first reason is that standard tool boxes can be identified and used for each of the 3 components. For example, any statistical significance test can be used for feature selection. Similarly, any machine learning method can be used for feature integration. The main challenge is in developing a ``standard'' tool box for feature generation suitable for DNA functional sites. The second reason is that features that are critical to the recognition of specific DNA functional sites are explicitly generated and selected in this paradigm. This explicitness is helpful in understanding the underlying biological mechanism of that DNA functional site.


This project is supported in part by the I2R-SOC Joint Lab on Knowledge Discovery from Clinical Data (7/03 - 6/07).

