Infinigram
Variable-length n-gram language models using suffix arrays.
Resources & Distribution
Infinigram: Variable-Length N-gram Language Models
Efficient variable-length n-gram language models using suffix arrays for O(m log n) pattern matching. Byte-level (UTF-8), works with any text.
Overview
Infinigram is a corpus-based language model where the corpus IS the model — no gradient descent, no training loop. It uses suffix arrays to find variable-length patterns and predict continuations.
Key properties:
- Instant training: Build a model by pointing at text
- Variable-length patterns: Automatically uses longest matching context
- Exact matching: Every prediction traces to actual corpus occurrences
- O(m log n) queries: Binary search on suffix arrays
- Byte-level: Fixed 256-token vocabulary (UTF-8 bytes 0-255)
- Mmap-backed: Memory-mapped files for corpora from 1KB to 100GB+
Quick Start
Installation
pip install py-infinigram
Or install from source:
git clone https://github.com/queelius/infinigram.git
cd infinigram
pip install -e .
Basic Usage
from infinigram import Infinigram
# Create model from text (byte-level, automatic UTF-8 encoding)
model = Infinigram(b"the cat sat on the mat the cat ran")
# Predict next byte probabilities
probs = model.predict(b"the cat", top_k=5)
# {32: 1.0} # space byte (32) has highest probability
# Find longest matching suffix
position, length = model.longest_suffix(b"the cat")
print(f"Matched {length} bytes at position {position}")
# String input works too (auto-encoded to UTF-8)
probs = model.predict("the cat", top_k=5)
Persistent Models
# Build and save a model
model = Infinigram.build("The quick brown fox jumps over the lazy dog.", "my_model")
# Load it later (instant — uses mmap, not RAM)
model = Infinigram.load("my_model")
probs = model.predict("The quick")
Interactive REPL
infinigram-repl
# Quick tour:
infinigram> ds demo # Create dataset
infinigram> add the cat sat # Add document
infinigram> predict the cat # Get predictions
infinigram> complete the --max 10 # Generate text
infinigram> config # Show settings
infinigram> help # All commands
See REPL_GUIDE.md for full documentation.
REST API Server
infinigram-serve --port 8000
OpenAI-compatible /v1/completions endpoint. See REST_API.md.
Key Features
Variable-Length Pattern Matching
Infinigram automatically finds the longest matching suffix in the corpus:
model = Infinigram(b"abcdef abcdef abcxyz")
pos, length = model.longest_suffix(b"abcd") # length=4
pos, length = model.longest_suffix(b"abc") # length=3
pos, length = model.longest_suffix(b"zzz") # length=0 (no match)
Multiple Prediction Strategies
# Standard: longest suffix match
probs = model.predict(b"the cat", top_k=10)
# Weighted: combine predictions from multiple suffix lengths
probs = model.predict_weighted(b"the cat", weight_fn=lambda k: k**2)
# Backoff: Stupid Backoff smoothing (Brants et al., 2007)
probs = model.predict_backoff(b"the cat", backoff_factor=0.4)
LLM Integration via TokenProbabilityAdapter
from infinigram import Infinigram, TokenProbabilityAdapter
import tiktoken
model = Infinigram(b"The quick brown fox jumps over the lazy dog.")
tokenizer = tiktoken.encoding_for_model("gpt-4")
adapter = TokenProbabilityAdapter(model, tokenizer)
# Compute token probability via byte-level chain rule:
# P(token|ctx) = product of P(byte_i | ctx, byte_1, ..., byte_{i-1})
prob = adapter.token_probability("The quick brown", tokenizer.encode(" fox")[0])
# Mix with LLM probabilities: P_final = alpha * P_LLM + (1-alpha) * P_corpus
llm_probs = {token_id: 0.3}
mixed = adapter.mix_probabilities("The quick brown", llm_probs, alpha=0.7)
Corpus Augmentation
from infinigram import Infinigram
from infinigram.corpus_utils import build_corpus_with_augmentation
docs = ["Hello World", "Goodbye World"]
corpus = build_corpus_with_augmentation(docs, augmentations=[str.lower, str.upper])
model = Infinigram(corpus)
Dynamic Updates
model = Infinigram(b"hello")
model.update(b" world") # Appends and rebuilds suffix array
print(model.n) # 11
API Reference
Infinigram Class
Infinigram(
corpus_or_path, # bytes, List[int], str, or Path
max_length: int = None, # Max suffix length (None = unlimited)
min_count: int = 1, # Min frequency threshold
)
Methods
| Method | Description |
|---|---|
predict(context, top_k=50, smoothing=0.0) | Next-byte probability distribution |
predict_weighted(context, weight_fn, top_k=50) | Multi-length weighted prediction |
predict_backoff(context, backoff_factor=0.4, top_k=50) | Stupid Backoff smoothing |
predict_logprobs(context, top_k=50) | Log-probabilities |
continuations(context, limit=10000) | Raw continuation counts (limit=None for all) |
longest_suffix(context) | (position, length) of longest match |
find_all_suffix_matches(context) | All matches at all suffix lengths |
confidence(context) | 0-1 score based on match quality |
search(pattern, limit=None) | All occurrence positions |
count(pattern) | Number of occurrences |
update(new_tokens) | Append tokens and rebuild index |
clear_cache() | Clear prediction caches |
get_context(position, window=50) | Corpus text around a position |
Class Methods
| Method | Description |
|---|---|
Infinigram.build(corpus, path, ...) | Build persistent model |
Infinigram.load(path) | Load existing model (mmap, instant) |
Infinigram.list_models() | List models in ~/.infinigram/models/ |
Performance
Byte-level model with mmap-backed suffix arrays:
| Corpus Size | Construction | Query Latency |
|---|---|---|
| 1 KB | <1 ms | <0.1 ms |
| 1 MB | ~100 ms | <1 ms |
| 1 GB | ~30 s | <10 ms |
Memory: ~9 bytes per corpus byte (corpus + suffix array on disk, mmap’d).
Performance optimizations (v0.5.0):
- Vectorized continuation counting with
np.bincount(3-10x speedup) - LRU cache for repeated
continuations()queries (5-20x for hot paths) - Direct bytes comparison in binary search (2-5x vs Python loop)
- Batch SQLite inserts for dataset index rebuild
Architecture
CLI & REPL infinigram-repl, infinigram-serve
REST API FastAPI, OpenAI-compatible /v1/completions
Python API Infinigram class (predict, search, continuations)
Suffix Array Engine MmapSuffixArray, ChunkedMmapSuffixArray (pydivsufsort)
See ARCHITECTURE.md for details.
Testing
508 tests, 79% coverage:
pytest tests/ # All tests
pytest tests/ -v # Verbose
pytest tests/ --cov=infinigram --cov-report=html # Coverage
pytest tests/test_infinigram.py -k "cache" # Keyword filter
Thread Safety
Infinigram instances are not thread-safe. Each thread should use its own model instance, or external synchronization must be used.
License
MIT License - see LICENSE for details.