Home Up Feedback Site Map Search

       CSA and SAGA program  

 

                             

 

 

Home
Links
Projects
People
Lecture Notes
Publications

Compressed Suffix Array (CSA) is an important full-text index frequently used for indexing texts without word boundary, such as DNA or protein sequences. However, constructing CSA requires a lot of space which makes this data-structure impractical. For example, constructing CSA for human genome requires at least 24G RAM. Recently, three algorithms (COCOON, ISAAC, FOCS) are proposed to construct CSA from the text space-efficiently. This report implements the ISAAC and FOCS algorithms (and compare the results with that of the COCOON algorithm). The ISAAC algorithm uses O (n log n) time and O (n log |∑|) working space. The FOCS algorithm uses O (n log log |∑|) time and O (n log |∑|)$ working space. To make the two algorithms practical, a number of implementation improvements are employed. The experimental results show that even though the FOCS algorithm has a better theoretical time complexity, ISAAC algorithm performs better in practice when using the same working space.

 

 

Send mail to gunjanch@comp.nus.edu.sg with questions or comments about this web site.
Last modified: 01/29/04