Digital Libraries

Outline

The Federalist papers

Disputed papers of the Federalist

Wordprint and Stylistics

Feature Selection

Bayes Theorem on function words

A Funeral Elegy and Primary Colors

Foster’s features

Typology of English texts

Features used (e.g., Dimension 1)

Discriminant analysis for text genres

Genre vs. Subject (Lee & Myaeng 02)

Putting the constraints together

In summary…

To think about…

Water Break

Copy detection

Duplicate detection characteristics

Signature method

Effect of granularity

Signature methods

Distance calculations

Subset problem

R-measure: amount repeated in other documents (Khmelev and Teahan)

R-measure example

Computer program plagiarism

Design-based methods

Recursive region coding

Fragments of a web page

Defining fragments

Conclusion

To think about…