Digital Libraries
Outline
The Federalist papers
Disputed papers of the
Federalist
Wordprint and Stylistics
Feature Selection
Bayes Theorem on function
words
A Funeral Elegy and Primary
Colors
Foster’s features
Typology of English texts
Features used (e.g.,
Dimension 1)
Discriminant analysis for
text genres
Genre vs. Subject (Lee
& Myaeng 02)
Putting the constraints
together
In summary…
To think about…
Water Break
Copy detection
Duplicate detection
characteristics
Signature method
Effect of granularity
Signature methods
Distance calculations
Subset problem
R-measure: amount
repeated in other documents (Khmelev and Teahan)
R-measure example
Computer program
plagiarism
Design-based methods
Recursive region coding
Fragments of a web page
Defining fragments
Conclusion
To think about…