A novel and simple strategy for evolving prototype based clustering

In this paper, we present a novel strategy for evolving prototype based clusters that uses a weighting scheme to “progressively forget” old samples. The rate of forgetfulness can be controlled by a single intuitive memory parameter. This weighting scheme can be used to create efficient dynamic summaries, such as mean or covariance, of data streams. Using this weighting scheme we have developed evolving versions of the K-means and Gaussian Mixture models algorithms. They can analyze the incoming data in an online manner and they are specially geared towards dealing with concept drift originated by changes in the underlying data distribution. The algorithms were validated over a simulated database where a wide variety of concept drift situations occur and over real data related to property sales, showing their capability to follow changes in data.

Palabras clave: Cluster evolution, Concept drift, Data stream, Evolving clustering, Gaussian mixture models, K-means