Back to Media

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, Chang, Lee, Toutanova
paper completed ai-ml

Notes

Bidirectional pre-training via masked language modeling. Defined the pre-train/fine-tune paradigm.