Semi-supervised clustering in graphs

David Chatel 1
1 MAGNET - Machine Learning in Information Networks
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : Nowadays, decision processes in various areas (marketing, biology, etc) require the processing of increasing amounts of more and more complex data. Because of this, there is a growing interest in machine learning techniques. Among these techniques, there is clustering. Clustering is the task of finding a partition of items, such that items in the same cluster are more similar than items in different clusters. This is a data-driven technique. Data come from different sources and take different forms. One challenge consists in designing a system capable of taking benefit of the different sources of data, even when they come in different forms. Among the different forms a piece of data can take, the description of an object can take the form of a feature vector: a list of attributes that takes a value. Objects can also be described by a graph which captures the relationships objects have with each others. In addition to this, some constraints can be known about the data. It can be known that an object is of a certain type or that two objects share the same type or are of different types. It can also be known that on a global scale, the different types of objects appear with a known frequency. In this thesis, we focus on clustering with three different types of constraints: label constraints, pairwise constraints and power-law constraint. A label constraint specifies in which cluster an object belong. Pairwise constraints specify that pairs of object should or should not share the same cluster. Finally, the power-law constraint is a cluster-level constraint that specifies that the distribution of cluster sizes are subject to a power-law. We want to show that introducing semi-supervision to clustering algorithms can alter and improve the solutions returned by unsupervised clustering algorithms. We contribute to this question by proposing algorithms for each type of constraints. Our experiments on UCI data sets and natural language processing data sets show the good performance of our algorithms and give hints towards promising future works.
Document type :
Theses
Complete list of metadatas

Cited literature [91 references]  Display  Hide  Download

https://hal.inria.fr/tel-01667429
Contributor : Team Magnet <>
Submitted on : Tuesday, December 19, 2017 - 1:12:47 PM
Last modification on : Friday, May 17, 2019 - 11:39:17 AM

File

Chatel.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01667429, version 1

Citation

David Chatel. Semi-supervised clustering in graphs. Artificial Intelligence [cs.AI]. Université de Lille, 2017. English. ⟨tel-01667429⟩

Share

Metrics

Record views

275

Files downloads

387