XLNet

Generalized Autoregressive Pretraining for Language Understanding

GPT-2

Language Models are Unsupervised Multitask Learners

BERT

Pre-training of Deep Bidirectional Transformers for Language Understanding