Computational Analysis of Genre, Authorship and Duplication

To think about…


¡	How to free duplicate detection
	algorithms from needing to do pairwise
	comparisons?

¡	What size chunk would you use for
	signature based methods for images,
	music, video? Would you encode a
	structural dependency as well (ordering
	using edit distance) or not (bag of chunks
	using VSM) for these other media types?