Efficient Computation of Fitness Function for Evolutionary Clustering

Sergey Muravyov; Denis Antipov; Arina Buzdalova; Andrey Filchenkov

doi:10.13164/mendel.2019.1.087

Sergey Muravyov Computer Technology Lab, ITMO University, Russia https://orcid.org/0000-0002-4251-1744
Denis Antipov Computer Technology Lab, ITMO University, Russia https://orcid.org/0000-0001-7906-096X
Arina Buzdalova Computer Technology Lab, ITMO University, Russia https://orcid.org/0000-0002-0438-0106
Andrey Filchenkov Computer Technology Lab, ITMO University, Russia

DOI: https://doi.org/10.13164/mendel.2019.1.087

Keywords: Clustering, evolutionary clustering, clustering validity indices, fitness function

Abstract

Evolutionary algorithms (EAs) are random search heuristics which can solve various optimization problems. There are plenty of papers describing different approaches developed to apply evolutionary algorithms to the clustering problem, although none of them addressed the problem of fitness function computation. In clustering, many clustering validity indices exist that are designed to evaluate quality of resulting points partition. It is hard to use them as a fitness function due to their computational complexity. In this paper, we propose an efficient method for iterative computation of clustering validity indices which makes application of the EAs to this problem much more appropriate than it was before.

References

Kaufman, L. and Rousseeuw, P. J. 2009. Finding groups in data: an introduction to cluster analysis, vol. 344. John Wiley & Sons, USA.

Jain, A. K. and Dubes, R. C. 1998. Algorithms for clustering data, vol. 6. Prentice Hall, Englewood Cliffs, NJ, USA.

Bigus, J. P. 1996. Data mining with neural networks: solving business problems from application development to decision support. McGraw-Hill, New York, USA.

Mecca, G., Raunich, S., and Pappalardo, A. 2007. A new algorithm for clustering search results. Data & Knowledge Engineering 62, 3, pp. 504-522.

Anderberg, M. R. 1973. Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic Press, USA.

Bonner, R. E. 1964. On some clustering techniques. IBM journal of research and development 8, 1, pp. 22-32.

Kleinberg, J. M. 2003. An impossibility theorem for clustering. In Advances in Neural Information Processing Systems 15 (NIPS 2002). MIT Press, pp. 463-470.

Arthur, D. and Vassilvitskii, S. 2007. k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms.ACM, New Orleans, Louisiana, pp. 1027-1035.

Ester, M. et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining KDD 1996. AAAI Press, Portland, Oregon, pp. 226-231.

Chakrabarti, D., Kumar, R., and Tomkins, A. 2006. Evolutionary clustering. In Proceedings of the 12th ACM international conference on Knowledge discovery and data mining - SIGKDD 2006. ACM, Philadelphia, PA, pp. 554-560.

Arbelaitz, O. et al. 2013. An extensive comparative study of cluster validity indices. Pattern Recognition 46, 1, pp. 243-256.

Ng, A. Y., Jordan, M. I., and Weiss, Y. 2002. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems 14, 2, pp. 849-856.

Doer, B. and Doer, C. 2015. Optimal Parameter Choices Through Self-Adjustment: Applying the 1/5th. In Proceedings of the Genetic and Evolutionary Computation Conference - GECCO 2015. ACM, pp. 1335-1342.

Buzdalov, M. and Buzdalova, A. 2013. Adaptive Selection of Helper-Objectives for Test Case Generation. In 2013 IEEE Congress on Evolutionary Computation. IEEE, pp. 2245-2250. DOI: 10.1109/CEC.2013.6557836

Hruschka E. R. et al. 2009. A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2, 39, pp. 133-155.