11 Oct 2005
CS 5244 - Computational
Document Analysis
21
Effect of granularity
¡
Divide the document into smaller chunks
¡
document – no division
sentence
window of
n
words
¡
¡
Large chunks
l
Lower probability of match, higher threshold
¡
¡
Small chunks
l
Smaller number of unique chunks
l
Lower search complexity