View on GitHub

Research Review Notes

Summaries of academic research papers

Beam Search Strategies for Neural Machine Translation


Idea

The standard beam search strategy for Neural Machine Translation (NMT) is for the decoder to predict the target sequence word-by-word and maintain a fixed amount of potential word candidates to predict at each step. The paper proposes a more flexible decoder strategy, by pruning the search graph, and reducing the number of candidates with the same partial hypothesis (shared past).

Background

The drawbacks of a vanilla beam-search is that it is less adaptive, because:

Standard beam search builds a translation from left-to-right and keeps a fixed number (beam) of translation candidates with the highest log-probability at each time step. If an end-of-sequence (EOS) token is encountered, the beam-size is reduced by 1 and the sequence is added to the candidate list of translations. When the beam-size eventually becomes 0, the log-probability of all the sequences weighted by sequence length is evaluated and the translation with the highest log-probability score is picked.

Method

4 separate beam search strategies are proposed. One or more of these can be applied on top of vanilla beam search.

English $\rightarrow$ German translation done at the sub-word level (reduces the computational complexity of the output softmax). English $\rightarrow$ Chinese translation done at the word level.

NMT implementation similar to the Neural Attention paper. Embedding dimension of 620 used, with RNN GRU unit for the latent representation of 1000 units. SGD is the learning algorithm used. Number of epochs not discussed, but a batch-size of 64 is used for training.

Experiments done to choose the most conservative threshold that doesn’t degrade translation accuracy.

Beam-sizes of 5 to 14 are experimented with.

Observations

\bibliographystyle{unsrt}