Skip to Main content Skip to Navigation
Conference papers

Can recurrent neural networks warp time?

Corentin Tallec 1 Yann Ollivier 1, 2
1 TAU - TAckling the Underspecified
Inria Saclay - Ile de France, LRI - Laboratoire de Recherche en Informatique
Abstract : Successful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms. Empirically these models have been found to improve the learning of medium to long term temporal dependencies and to help with vanishing gradient issues. We prove that learnable gates in a recurrent model formally provide quasi-invariance to general time transformations in the input data. We recover part of the LSTM architecture from a simple axiomatic approach. This result leads to a new way of initializing gate biases in LSTMs and GRUs. Experimentally , this new chrono initialization is shown to greatly improve learning of long term dependencies, with minimal implementation effort. Recurrent neural networks (e.g. (Jaeger, 2002)) are a standard machine learning tool to model and represent temporal data; mathematically they amount to learning the parameters of a parameterized dynamical system so that its behavior optimizes some criterion, such as the prediction of the next data in a sequence.
Document type :
Conference papers
Complete list of metadata

https://hal.inria.fr/hal-01812064
Contributor : Corentin Tallec <>
Submitted on : Monday, June 11, 2018 - 11:03:50 AM
Last modification on : Sunday, May 2, 2021 - 3:30:47 AM
Long-term archiving on: : Wednesday, September 12, 2018 - 1:40:34 PM

File

iclr_chrono.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01812064, version 1

Citation

Corentin Tallec, Yann Ollivier. Can recurrent neural networks warp time?. International Conference on Learning Representation 2018, Apr 2018, Vancouver, France. ⟨hal-01812064⟩

Share

Metrics

Record views

223

Files downloads

384