Adaptive Control of Hybridization Noise in DNA Sequencing-by-Hybridization

F.P Preparata  Hon-Wai, Leong  Wing-Kin, Sung  Hugo Willy

 

We consider the problem of sequence reconstruction in sequencing-by-hybridization in the presence of spectrum errors. As suggested by intuition, and reported in the literature, false-negatives (i.e., missing spectrum probes) are by far the leading cause of reconstruction failures. In a recent paper we have described an algorithm, called ``threshold-", designed to recover from false negatives. This algorithm is based on overcompensating for missing extensions by allowing larger reconstruction subtrees. We demonstrated, both analytically and with simulations, the increasing effectiveness of the approach as the parameter grows, but also pointed out that for larger error rates the size of the extension trees translates into an unacceptable computational burden. To obviate this shortcoming, in this paper we propose an adaptive approach which is both effective and efficient. Effective, because for a fixed value of it performs as well as its single-threshold counterpart, efficient because it exhibits substantial speed-ups over it. The idea is that, for moderate error rates a small fraction of the target sequence can be involved in error recovery; thus, expectedly the remainder of the sequence is reconstructible by the standard noiseless algorithm, with the provision to switch to operation with increasingly higher thresholds after detecting failure. This policy generates interesting and complex interplays between fooling probes and false negatives. These phenomena are carefully analyzed for random sequences and the results are found to be in excellent agreement with the simulations. In addition, the experimental algorithmic speed-ups of the multithreshold approach are explained in terms of the interaction amongst the different threshold regimes.

 

References

H.W. Leong, F.P. Preparata, W.K. Sung and H. Willy. Adaptive control of hybridization noise in DNA Sequencing-by-Hybridization.

Journal of Bioinformatics and Computational Biology, 3(1):79-98, 2005

 

The Simulation program implemented in JAVA is available here.