Clustering by Gaussian Mixture Model and Light Gradient Boosting Machine

Date

2024

Authors

Yang, Feihan

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This project studies clustering by Gaussian mixture model (GMM) and Bayesian Gaussian mixture model (BGMM) combined with light gradient boosting machine (LightGBM) respectively. One common unsupervised learning method for clustering, K-means, serves as the baseline for comparison. LightGBM is an ensemble supervised learning method that combines a number of weak learners to form a strong learner. In this project, LightGBM is combined with BGMM and GMM to improve the clustering performance. A Kaggle competition dataset is used to test these different learning algorithms. Performance evaluation is based on rand index that assesses the similarity between the ground truth clusters and predicted clusters. Moreover, intracluster distances and intercluster distances that indicate the aggregation of the clusters and the separation between different clusters respectively are calculated to generate other performance metrics. In particular, an intercluster distance named multi-cluster average centroid linkage distance is proposed to simplify the distance computation with high precision. The evaluation results reveal that LightGBM with BGMM consistently outperforms the other methods making it a preferred classification approach for the dataset.

Description

Keywords

unsupervised learning, LightGBM

Citation