Temporal Localization of Actions with Actoms

Adrien Gaidon; Zaid Harchaoui; Cordelia Schmid

Rapport (Rapport De Recherche) Année : 2012

Temporal Localization of Actions with Actoms

(1, 2) , (1) , (1)

1
2

Adrien Gaidon

Fonction : Auteur correspondant
PersonId : 865483

Connectez-vous pour contacter l'auteur

Learning and recognition in vision

Microsoft Research - Inria Joint Centre

Zaid Harchaoui

Fonction : Auteur
PersonId : 895242

Learning and recognition in vision

Cordelia Schmid

Fonction : Auteur
PersonId : 831154

Learning and recognition in vision

Résumé

We address the problem of detecting actions, such as drinking or opening a door, in hours of challenging video data. We propose a model based on a sequence of atomic action units, termed "actoms", that are semantically meaningful and characteristic for the action. Our Actom Sequence Model (ASM) represents the temporal structure of actions as a sequence of histograms of actom-anchored visual features. Our representation, which can be seen as a temporally structured extension of the bag-of-features, is flexible, sparse, and discriminative. Training requires the annotation of actoms for action examples. At test time, actoms are detected automatically based on a non-parametric model of the distribution of actoms, which also acts as a prior on an action's temporal structure. We present experimental results on two recent benchmarks for temporal action detection: "Coffee and Cigarettes" and the "DLSB" dataset. We also adapt our approach to a classification by detection set-up and demonstrate its applicability on the challenging "Hollywood 2" dataset. We show that our ASM method outperforms the current state of the art in temporal action detection, as well as baselines that detect actions with a sliding window method combined with bag-of-features.

Cet article s'intéresse au problème de la détection temporelle d'actions, comme "ouvrir une porte", dans des bases de données contenant des heures de vidéo. Nous proposons un modèle basé sur des suites d'actions atomiques, appelées "actoms". Ces actoms sont des sous-événements interprétables qui caractérisent l'action à modéliser. Notre modèle, nommé "Actom Sequence Model" (ASM), décrit la structure temporelle d'une action par le biais d'une suite d'histogrammes de descripteurs locaux localisés temporellement. Cette représentation est une extension flexible, parcimonieuse, discriminative et structurée du populaire "sac de mots visuels". La période d'apprentissage nécessite l'annotation manuelle d'actoms, sans que cela ne soit requis à l'étape de détection. En effet, les actoms de nouvelles vidéos sont automatiquement détectés à l'aide d'un modèle non-paramétrique de la structure temporelle d'une action, estimé à partir des exemples d'apprentissage. Nous présentons des résultats expérimentaux sur deux bases de données récentes pour la détection temporelle d'actions: "Coffee and Cigarettes" et "DLSBP". De plus, nous adaptons notre approche au problème de classification par détection et démontrons ses performances sur la base "Hollywood 2". Nos résultats montrent que l'utilisation d'ASM améliore les performances par rapport à l'état de l'art et par rapport à l'approche par fenêtre glissante avec sac de mots, couramment utilisée en détection.

Mots clés

Action Recognition Video Analysis

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

RR-7930.pdf (1.17 Mo)

sliding_central_frame.png (173.64 Ko)

Origine : Accord explicite pour ce dépôt

Format : Figure, Image

THOTH Team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00687312

Soumis le : lundi 21 janvier 2013-11:38:23

Dernière modification le : jeudi 4 avril 2024-20:59:06

Archivage à long terme le : lundi 22 avril 2013-03:52:36

Dates et versions

hal-00687312 , version 1 (12-04-2012)

hal-00687312 , version 2 (21-01-2013)

Identifiants

HAL Id : hal-00687312 , version 2

Citer

Adrien Gaidon, Zaid Harchaoui, Cordelia Schmid. Temporal Localization of Actions with Actoms. [Research Report] RR-7930, INRIA. 2012. ⟨hal-00687312v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA INRIA-RRRT LJK LJK_GI LJK_GI_LEAR INRIA2 LARA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

737 Consultations

872 Téléchargements

Temporal Localization of Actions with Actoms

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager