The Transformers Origin Story
In 2017, researchers at Google published a paper that proposed a novel neural network architecture for sequence modeling. Dubbed the Transformer, this architecture outperformed recurrent neural networks (RNNs) on machine translation tasks, both in terms of translation quality and training cost.
In parallel, an effective transfer learning method called ULMFiT showed that pretraining Long-Short Term Memory (LSTM) networks with a language modeling objective on a very large and diverse corpus, and then fine-tuning on a target task could produce robust text classifiers with little labeled data.
These advances were the catalysts for two of the most well-known transformers today: GPT and BERT. By combining the Transformer architecture with language model pretraining, these models removed the need to train task-specific architectures from scratch and broke almost every benchmark in NLP by a significant margin. Since the release of GPT and BERT, a veritable zoo of transformer models has emerged, and a timeline of the recent events is shown in Figure 1-2.

Expansive and Diverse Curriculum Offerings
Prepare for success in college and beyond with our comprehensive curriculum.
Empowering Future Innovators: Cutting-Edge STEM Education
Top-tier education in Houston, Texas, shaping future leaders with a well-rounded curriculum.
Arts and Humanities
Nurture creativity, critical thinking, and empathy through our robust arts program.
Leave a Reply