SCHOOL OF COMPUTING,NUS 
POSTGRADUATE SEMINAR BY 

MS CAO XIA 

Approximate Matching in Genomic Sequence Data


Executive Classroom, SoC-1 level 5 

6 March 2006, 10.00am


Abstract: 

Increasing interest in genetic research has resulted in the
creation of huge genomic databases, and approximate sequence
matching in genomic sequence databases has become a basic 
operation in computational biology. In this thesis, we studied
three research problems: DNA sequence similarity search in
sequence database, DNA sequence approximate join, and protein
subcellular localization prediction, which are all related to 
sequence approximate matching in genomic databases. Our
experimental results showed that 1)the proposed search model and
index structure are very effective in organizing a large genomic
sequence database; 2)the proposed novel filtering algorithms are 
very efficient in processing approximate sequence matching; and 3)
the proposed q-gram based feature vectors extracted from protein
sequence are helpful in predicting the subcellular localization of
protein sequences.