Finetuned Language Models Are Zero-Shot Learners
Evaluating Large Language Models Trained on Code
Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Language Models are Few-Shot Learners
Pre-training Text Encoders as Discriminators Rather Than Generators