11 Oct 2005
CS 5244 - Computational Document
Analysis
To think about…
¡How to
free duplicate detection algorithms from needing
to do pairwise comparisons?
¡
¡What size chunk would you use for
signature based methods for images, music,
video? Would you encode a structural dependency as
well (ordering using edit distance) or not (bag
of chunks using VSM) for these other media types?