The oligo microarray (DNA chip) technology in recent years has a significant impact on genomic study. Many fields such as gene discovery, drug discovery, toxicological research and disease diagnosis, will certainly benefit from its use. A microarray is an orderly arrangement of thousands of DNA fragments where each DNA fragment is a probe (or a fingerprint) of a gene/cDNA. It is important that each probe must uniquely associate with a particular gene/cDNA. Otherwise, the performance of the microarray will be affected.
Existing algorithms usually select probes using the criteria of homogeneity, sensitivity, and specificity. We propose to include one additional criterion, uniformity, which further improves the quality of the probes selected. For efficiency, existing algorithms reduce the time complexity by employing some heuristics. Such approaches reduce the accuracy.
Instead, we make use of some smart filtering techniques to avoid redundant computation while maintaining the accuracy. Based on the new algorithm, optimal short (20 bases) or long (50 or 70 bases) probes can be computed efficiently for large genomes.
Our algorithm selects good probes based on the criteria of homogeneity, sensitivity and specificity as proposed by Lockhart.
Homogeneity is the ability of a probe to hybridize at a given experiment temperature. For every probe, the melting temperature is the temperature at which 50% of the probe can hybridize to its complementary strand. To be a good probe p for an intended target, we should make sure the melting temperature of p is close to the specified experimental temperate.
Sensitivity is the ability of a probe to detect low-abundance mRNAs. This is a key performance feature of microarrays which can be jeopardized by probes that form significant secondary structures. Thus it is important to reject probes with high self-complementariness and select probes with minimal secondary structure.
Specificity measures the uniqueness of a probe to its corresponding gene in the genome. A probe that is unique to its corresponding gene in terms of sequence similarity minimizes the chance of cross-hybridization. This step is very computational intensive and takes up the most time in probe design programs. However, by the use of the Pigeon Hole Principle, we speeded up specificity filter greatly. Our algorithm only finds and checks exact regions in the genome that potentially cause cross-hybridization. Since these regions are small compared to the entire genome, we avoid redundant checks. Most importantly, our approach is not a heuristic approach and thus is able to filter all ``bad'' probes.
To improve accuracy, we proposed to include the uniformity criterion to obtain probes with a highly uniform distribution of mismatches. This is important because the distribution of similar sequences in a probe affects its reliability. Uniformity filter eliminates probes that have many long substrings appearing in other genes. Since every probe as well as its substrings is unlikely to hybridize with the incorrect genes, the resulting probe set is more accurate.
Uniformity measures the mismatch distribution of a probe with all other non-target sequences in the genome. A probe with a good mismatch distribution, termed as a uniform probe, has two properties:
Table 1 - Benchmark results for short probes design
Benchmark results of our algorithm to design short probes (20-25 mers) for the 4 genomes.
Table 2 - Benchmark results for long probes design
Benchmark results of our algorithm to design long probes (50 mers) for the 4 genomes.
Download Program: FindProbe_v2.zip
Send mail to
questions or comments about this web site.