Language Models are Unsupervised Multitask Learners
Attentive Language Models Beyond a Fixed-Length Context
Pre-training of Deep Bidirectional Transformers for Language Understanding
Improving Language Understanding by Generative Pre-Training
Attention Is All You Need