diff --git a/_posts/2018-06-27-illustrated-transformer.md b/_posts/2018-06-27-illustrated-transformer.md index efe7eba146672..77d5e2eca0cea 100644 --- a/_posts/2018-06-27-illustrated-transformer.md +++ b/_posts/2018-06-27-illustrated-transformer.md @@ -61,7 +61,7 @@ The encoder's inputs first flow through a self-attention layer -- a layer that h The outputs of the self-attention layer are fed to a feed-forward neural network. The exact same feed-forward network is independently applied to each position. -The decoder has both those layers, but between them is an attention layer that helps the decoder focus on relevant parts of the input sentence (similar what attention does in [seq2seq models](https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/)). +The decoder has both those layers, but between them is an attention layer that helps the decoder focus on relevant parts of the input sentence (similar to what attention does in [seq2seq models](https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/)).