Attention Is All You Need

Attention Is All You Need
An illustration of main components of the transformer model from the paper
Project typeArtificial intelligence research
SponsorsGoogle
ObjectiveProvide a novel approach to train AI
Duration2017 (2017) –
Websiteproceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

"Attention Is All You Need" is a 2017 research paper in machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based on the attention mechanism proposed in 2014 by Bahdanau et al. The transformer approach it describes has become the main architecture of a wide variety of AI, including large language models. At the time, the focus of the research was on improving Seq2seq techniques for machine translation, but the authors go further in the paper, foreseeing the technique's potential for other tasks like question answering and what is now known as multimodal generative AI.

Some early examples that the team tried their Transformer architecture on included English-to-German translation, generating Wikipedia articles on "The Transformer", and parsing. These convinced the team that the Transformer is a general-purpose language model, and not just good for translation.

As of 2025, the paper has been cited more than 173,000 times, placing it among the top ten most-cited papers of the 21st century. After the paper was published by Google, each of the authors left the company to join other companies or to found startups.