A Faster and More Space-Efficient Algorithm for Inferring Arc-Annotations of RNA Sequences through Alignment

 

Jesper Jansson, See-Kiong Ng, Wing-Kin Sung and Hugo Willy

 

The nested arc-annotation of a sequence is an important model used to represent structural information for RNA and protein sequences. Given two sequences S1 and S2 and a nested arc-annotation P1 for S1, this paper considers the problem of inferring the nested arc-annotation P2 for S2 such that (S1,P1) and (S2,P2) have the largest common substructure. The problem has a direct application in predicting the secondary structure of an RNA sequence given a closely related sequence with known secondary structure. The currently most efficient algorithm for this problem requires O(nm3) time and O(nm2) space where n is the length of the sequence with known arc-annotation and m is the length of the sequence whose arc-annotation is to be inferred. By using sparsification on a new recursive dynamic programming algorithm and applying a Hirschberg-like traceback technique with compression, we obtain an improved algorithm that runs in min{O(nm2+n2m),O(nm2 log n), O(nm3)} time and min{O(m2 + mn), O(m2 log n+n)} space.

 

References

Jesper Jansson, See-Kiong Ng, Wing-Kin Sung and Hugo Willy. A Faster and More Space-Efficient Algorithm for Inferring Arc-Annotations of RNA Sequences through Alignment, Algorithmica, Volume 46, Number 2 / October, 2006, pg 223-245

 

The basic WLCS program implemented in JAVA is available here. (full implementation of the backtracking is still not available)