Back to Media
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin, Chang, Lee, Toutanova
Notes
Bidirectional pre-training via masked language modeling. Defined the pre-train/fine-tune paradigm.