PowerPoint Presentation

CS6210: Document clustering in DLs

K-means Clustering

wTakes input parameter k, and partitions a set of n documents into k clusters.

wIntracluster similarity is high.

wIntercluster similarity is low.

wCluster similarity is measured in regard to the mean value of the documents in a cluster, known as centroids.