11 Oct 2005
CS 5244 - Computational Document Analysis
19
Duplicate detection characteristics
ˇPlagiarism
lcopies intentionally
lmay obfuscate
ltarget and source relation
ˇ
ˇSelf-plagiarism*
lcopy from one’s own work
lOften to offer for background of work in incremental research
ˇ(near) Clone/duplicate
lsame functionality in code / citation data
lbut in different modules by different developers
ˇFragment
lweb page content generated by content manager
linterferes with spiders’ re-sampling rate
ˇ