Duplicate detection characteristics
ˇ Plagiarism
l copies intentionally
l may obfuscate
l target and source
relation
ˇ Self-plagiarism*
l copy from one’s own
work
l Often to offer for
background of work
in incremental
research
ˇ (near) Clone/duplicate
l same functionality in
code / citation data
l but in different
modules by different
developers
ˇ Fragment
l web page content
generated by content
manager
l interferes with
spiders’ re-sampling
rate