11 Oct 2005
CS 5244 - Computational Document Analysis
14
Putting the constraints together
Document Frequency Ratios
(coverage of term to genre or genre+subject)
Use these to define the weight
Where σ is a penalty (“deviation”) factor for terms that are spread widely over different subjects
What are some negative aspects of this approach?