11 Oct 2005
CS 5244 - Computational Document Analysis
33
To think about…
¡How to free duplicate detection algorithms from needing to do pairwise comparisons?
¡
¡What size chunk would you use for signature based methods for images, music, video? Would you encode a structural dependency as well (ordering using edit distance) or not (bag of chunks using VSM) for these other media types?