PowerPoint Presentation

CS6210: Document clustering in DLs

Bisecting K-means Algorithm

Starts with a single cluster

1.Pick a cluster to split.

2.Find 2 sub-clusters using the basic k-means algorithm.

3.Repeat step 2 for X times and select the split that results in the highest overall similarity.

4.Repeat steps 1, 2 and 3 until the desired number of clusters is reached.

Splitting criteria: largest cluster