Bengio's Language Model: the Markov Assumption Made Architectural
The simplest neural language model: embed the last N tokens, concatenate, predict the next. What you give up with a fixed window, and what you gain by dropping recurrence.
Browse posts by tag
The simplest neural language model: embed the last N tokens, concatenate, predict the next. What you give up with a fixed window, and what you gain by dropping recurrence.
The canonical NLP book, updated for the LLM era.
Seminal blog post demonstrating char-level RNN power. Shakespeare, LaTeX, kernel code generation.
Seminal blog post demonstrating the power of character-level RNNs. Shows Shakespeare generation, Wikipedia generation, LaTeX generation, and Linux kernel code generation. The visualizations of LSTM cells are particularly illuminating.
A corpus-based language model using suffix arrays for O(m log n) pattern matching. The corpus is the model.
A mathematical framework that treats language models as algebraic objects with compositional structure.
The evolution of neural sequence prediction, and how it connects to classical methods
The bias-data trade-off in sequential prediction: when to use CTW, n-grams, or neural language models.
The classical approach to sequence prediction: counting and smoothing