Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Yangtao Wang; Xi Shen; Shell Hu; Yuan Yuan; James L. Crowley; Dominique Vaufreydaz

Communication Dans Un Congrès Année : 2022

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

(1) , (2, 3) , (4) , (5) , (1) , (1)

1
2
3
4
5

Yangtao Wang

Fonction : Auteur

Multimodal Perception and Sociable Interaction

Xi Shen

Fonction : Auteur

Tencent AI Lab

Laboratoire d'Informatique Gaspard-Monge

Shell Hu

Fonction : Auteur

Samsung AI Center [Cambridge]

Yuan Yuan

Fonction : Auteur

MIT Computer Science & Artificial Intelligence Lab

James L. Crowley

Fonction : Auteur
PersonId : 5323
IdHAL : james-crowley
ORCID : 0000-0001-7730-8968
IdRef : 033907641

Multimodal Perception and Sociable Interaction

Dominique Vaufreydaz

Fonction : Auteur
PersonId : 8656
IdHAL : vaufreydaz
ORCID : 0000-0002-8825-0973
IdRef : 064812596

Multimodal Perception and Sociable Interaction

Résumé

Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state of the art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUT-OMRON respectively compared to previous state of the art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet.

Mots clés

Object Discovery Unsupervised Learning Transformer

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Machine Learning [stat.ML]

Fichier principal

TokenCut.pdf (7.46 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Dominique Vaufreydaz : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03585410

Soumis le : jeudi 24 mars 2022-08:16:54

Dernière modification le : vendredi 5 avril 2024-03:28:13

Dates et versions

hal-03585410 , version 1 (23-02-2022)

hal-03585410 , version 2 (24-03-2022)

Identifiants

HAL Id : hal-03585410 , version 2
ARXIV : 2202.11539

Citer

Yangtao Wang, Xi Shen, Shell Hu, Yuan Yuan, James L. Crowley, et al.. Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut. CVPR 2022 - Conference on Computer Vision and Pattern Recognition, Jun 2022, New Orleans, United States. ⟨hal-03585410v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC UGA CNRS LIG LIGM_A3SI PARISTECH LIGM MIAI ANR LIG_SIC_M-PSI LIG_SIDCH UNIV-EIFFEL JSE2024

280 Consultations

192 Téléchargements

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager