Embedded Topics in the Stochastic Block Model - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Statistics and Computing Année : 2023

Embedded Topics in the Stochastic Block Model

Résumé

Communication networks such as emails or social networks are now ubiquitous and their analysis has become a strategic field. In many applications, the goal is to automatically extract relevant information by looking at the nodes and their connections. Unfortunately, most of the existing methods focus on analysing the presence or absence of edges and textual data is often discarded. However, all ommunication networks actually come with textual data on the edges. In order to take into account this specificity, we consider in this paper networks for which two nodes are linked if and only if they share textual data. We introduce a deep latent variable model allowing embedded topics to be handled called ETSBM to simultaneously perform clustering on the nodes while modelling the topics used between the different clusters. ETSBM extends both the stochastic block model (SBM) and the embedded topic model (ETM) which are core models for study ing networks and corpora, respectively. The inference is done using a variational-Bayes expectation-maximisation algorithm combined with a stochastic gradient descent. The methodology is evaluated on synthetic data and on a real world dataset. Keywords: Graph clustering, topic modelling, variational inference, generative model, probabilistic model, embedded topic model, stochastic block model .
Fichier principal
Vignette du fichier
ETSBM.pdf (3.21 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03782528 , version 1 (21-09-2022)
hal-03782528 , version 2 (25-07-2023)

Licence

Paternité

Identifiants

Citer

Rémi Boutin, Charles Bouveyron, Pierre Latouche. Embedded Topics in the Stochastic Block Model. Statistics and Computing, 2023, 33 (5), pp.95. ⟨10.1007/s11222-023-10265-9⟩. ⟨hal-03782528v2⟩
113 Consultations
59 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More