Model-based clustering of categorical data based on the Hamming distance (Joint work with L. Paci and E. Filippi-Mazzola)

Relatore:  Raffaele Argiento - Università di Bergamo
  giovedì 24 ottobre 2024 alle ore 12.00

This talk describes a new model-based approach for clustering categorical data with no natural ordering. The proposed method exploits the Hamming distance to define a family of probability mass functions to model the data. The elements of this family are then considered as kernels of a finite mixture model with an unknown number of components.
Conjugate Bayesian inference is derived for the parameters of the Hamming distribution model. The mixture is framed in a Bayesian nonparametric setting, and a transdimensional blocked Gibbs sampler is developed to provide full Bayesian inference on the number of clusters, their structure and the group-specific parameters,
facilitating the computation with respect to customary reversible jump algorithms.
The proposed model encompasses a parsimonious latent class model as a special case when the number of components is fixed. Model performances are assessed via a simulation study and reference datasets, showing improvements in clustering recovery over existing approaches.


Referente
Catia Scricciolo

Referente esterno
Data pubblicazione
20 luglio 2024

Condividi