To think about…
¡ How to free duplicate detection
algorithms from needing to do pairwise
comparisons?
¡ What size chunk would you use for
signature based methods for images,
music, video? Would you encode a
structural dependency as well (ordering
using edit distance) or not (bag of chunks
using VSM) for these other media types?