April 24, 2026
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Notes
Bidirectional pre-training via masked language modeling. Defined the pre-train/fine-tune paradigm.
Browse posts by tag
Bidirectional pre-training via masked language modeling. Defined the pre-train/fine-tune paradigm.