GPT-2

Language Models are Unsupervised Multitask Learners

BERT

Pre-training of Deep Bidirectional Transformers for Language Understanding

GPT-1

Improving Language Understanding by Generative Pre-Training