Skip to Main content Skip to Navigation

Learnable factored image representations for visual discovery

Théophile Dalens 1
1 WILLOW - Models of visual object recognition and scene understanding
DI-ENS - Département d'informatique de l'École normale supérieure, Inria de Paris
Abstract : Large-scale quantitative temporal analysis of heritage text data has made a great impact in understanding social trends. Similar large-scale analysis of temporal image collections would enable new applications in medicine, science or history of art. However, temporal analysis of visual data is a notoriously difficult task. The key challenge is that images do not have a given vocabulary of visual elements, in analogy to words in text, that could be used for such analysis. In addition, objects depicted in images vary greatly in appearance due to camera viewpoint, illumination, or intra-class variation. The objective of this thesis is to develop tools to analyze temporal image collections in order to identify and highlight visual trends over time. This thesis proposes an approach for analyzing unpaired visual data annotated with time stamps by generating how images would have looked like if they were from different times. To isolate and transfer time dependent appearance variations, we introduce a new trainable bilinear factor separation module. We analyze its relation to classical factored representations and concatenation-based auto-encoders. We demonstrate this new module has clear advantages compared to standard concatenation when used in a bottleneck encoder-decoder convolutional neural network architecture. We also show that it can be inserted in a recent adversarial image translation architecture, enabling the image transformation to multiple different target time periods using a single network. We apply our model to a challenging collection of more than 13,000 cars manufactured between 1920 and 2000 and a dataset of high school yearbook portraits from 1930 to 2009. This allows us, for a given new input image, to generate a "history-lapse video" revealing changes over time by simply varying the target year. We show that by analyzing the generated history-lapse videos we can identify object deformations across time, extracting interesting changes in visual style over decades.
Complete list of metadata

Cited literature [71 references]  Display  Hide  Download
Contributor : Théophile Dalens <>
Submitted on : Tuesday, September 24, 2019 - 7:57:19 PM
Last modification on : Thursday, July 1, 2021 - 5:58:09 PM
Long-term archiving on: : Sunday, February 9, 2020 - 3:31:53 PM


Files produced by the author(s)


  • HAL Id : tel-02296150, version 1



Théophile Dalens. Learnable factored image representations for visual discovery. Computer Vision and Pattern Recognition [cs.CV]. Ecole Normale Superieure de Paris - ENS Paris, 2019. English. ⟨tel-02296150⟩



Record views


Files downloads