Sequence to Sequence Learning with Neural Networks

Using LSTMs for general sequence-to-sequence problems

Deep neural networks are powerful machine learning models that achieve excellent performance on difficult problems. If there exists a parameter setting of a large DNN that achieves good results, supervised backpropagation will find these parameters and solve the problem.

The Problem

Many important problems are best expressed with sequences whose lengths are not known a-priori. For example, speech recognition and machine translation are sequential problems. Likewise, question-answering can also be seen as mapping a sequence of words representing the question to a sequence of words representing the answer.

The Approach

The core idea is to use one LSTM to read the input sequence, one timestep at a time, to obtain a large fixed-dimensional vector representation, and then use another LSTM to extract the output sequence from that vector.

The second LSTM is essentially a recurrent neural network language model, except that it is conditioned on the input sequence.

Key Properties

  • The LSTM learns to map an input sequence of variable length into a fixed-dimensional vector representation
  • The encoder-decoder architecture separates input processing from output generation
  • Attention mechanisms (in later work) address limitations of fixed-size representations

Related Work

  • Kalchbrenner and Blunsom, "Recurrent continuous translation models" (EMNLP, 2013)
  • Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation" (2014)