Skip to Main content Skip to Navigation
Conference papers

Can recurrent neural networks warp time?

Corentin Tallec 1 yann Ollivier 1, 2 
1 TAU - TAckling the Underspecified
LRI - Laboratoire de Recherche en Informatique, Inria Saclay - Ile de France
Abstract : Successful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms. Empirically these models have been found to improve the learning of medium to long term temporal dependencies and to help with vanishing gradient issues. We prove that learnable gates in a recurrent model formally provide quasi-invariance to general time transformations in the input data. We recover part of the LSTM architecture from a simple axiomatic approach. This result leads to a new way of initializing gate biases in LSTMs and GRUs. Experimentally , this new chrono initialization is shown to greatly improve learning of long term dependencies, with minimal implementation effort. Recurrent neural networks (e.g. (Jaeger, 2002)) are a standard machine learning tool to model and represent temporal data; mathematically they amount to learning the parameters of a parameterized dynamical system so that its behavior optimizes some criterion, such as the prediction of the next data in a sequence.
Document type :
Conference papers
Complete list of metadata
Contributor : Corentin Tallec Connect in order to contact the contributor
Submitted on : Monday, June 11, 2018 - 11:03:50 AM
Last modification on : Saturday, June 25, 2022 - 10:31:53 PM
Long-term archiving on: : Wednesday, September 12, 2018 - 1:40:34 PM


Files produced by the author(s)


  • HAL Id : hal-01812064, version 1


Corentin Tallec, yann Ollivier. Can recurrent neural networks warp time?. International Conference on Learning Representation 2018, Apr 2018, Vancouver, France. ⟨hal-01812064⟩



Record views


Files downloads