|
|
|
|
Compressed Suffix Array (CSA) is an important full-text index frequently used for indexing texts without word boundary, such as DNA or protein sequences. However, constructing CSA requires a lot of space which makes this data-structure impractical. For example, constructing CSA for human genome requires at least 24G RAM. Recently, three algorithms (COCOON, ISAAC, FOCS) are proposed to construct CSA from the text space-efficiently. This report implements the ISAAC and FOCS algorithms (and compare the results with that of the COCOON algorithm). The ISAAC algorithm uses O (n log n) time and O (n log |∑|) working space. The FOCS algorithm uses O (n log log |∑|) time and O (n log |∑|)$ working space. To make the two algorithms practical, a number of implementation improvements are employed. The experimental results show that even though the FOCS algorithm has a better theoretical time complexity, ISAAC algorithm performs better in practice when using the same working space. |
|
Send mail to
gunjanch@comp.nus.edu.sg with
questions or comments about this web site.
|