11 Oct 2005
CS 5244 - Computational Document Analysis
23
Distance calculations
¡Calculate distance between p1, p2
¡VSM: L1 distance Σf|Pf1-Pf2|
¡VSM: L2 Euclidean distance (Σf|Pf1-Pf2|2)1/2
¡Weighted feature combinations
¡For text features, can use edit distance
lCalculate using dynamic programming
¡
¡Detect and flag copies
¡Assume top n% as possible plagiarisms
¡Use a tuned similarity threshold
¡Other way: do tuning on supervised set
(learn weights for features: Bilenko and Mooney)
What are some problems with these approaches?