Monday, May 30, 2005
K-means clustering is a compact, recursive algorithm for breaking a given dataset into a prescribed number of "clusters". One very difficult problem is knowing how many clusters to look for in a given data set. After reading this summary I started to think about how I visually look for clusters in a random array of points. I realized that I mentally envision a boundary around the points. Then I look for circular or elliptical sections that are connected by narrow necks to the rest of the group. The result is an intuitive number of groups. I wonder if there is an algorithm out there for finding peninsulas or narrow necks on a random closed-boundary shape.