11 Oct 2005
CS 5244 - Computational Document Analysis
21
Effect of granularity
¡Divide the document into smaller chunks
¡ document – no division
sentence
window of n words
¡
¡Large chunks
lLower probability of match, higher threshold
¡
¡Small chunks
lSmaller number of unique chunks
lLower search complexity