SCHOOL OF COMPUTING,NUS POSTGRADUATE SEMINAR BY MS CAO XIA Approximate Matching in Genomic Sequence Data Executive Classroom, SoC-1 level 5 6 March 2006, 10.00am Abstract: Increasing interest in genetic research has resulted in the creation of huge genomic databases, and approximate sequence matching in genomic sequence databases has become a basic operation in computational biology. In this thesis, we studied three research problems: DNA sequence similarity search in sequence database, DNA sequence approximate join, and protein subcellular localization prediction, which are all related to sequence approximate matching in genomic databases. Our experimental results showed that 1)the proposed search model and index structure are very effective in organizing a large genomic sequence database; 2)the proposed novel filtering algorithms are very efficient in processing approximate sequence matching; and 3) the proposed q-gram based feature vectors extracted from protein sequence are helpful in predicting the subcellular localization of protein sequences.