active library

Infinigram

Variable-length n-gram language models using suffix arrays.

Started 2025 HTML

Resources & Distribution

Source Code

GitHub Repository GitHub Pages

Package Registries

PyPI

2 Stars

Infinigram: Variable-Length N-gram Language Models

Efficient variable-length n-gram language models using suffix arrays for O(m log n) pattern matching. Byte-level (UTF-8), works with any text.

Overview

Infinigram is a corpus-based language model where the corpus IS the model — no gradient descent, no training loop. It uses suffix arrays to find variable-length patterns and predict continuations.

Key properties:

Instant training: Build a model by pointing at text
Variable-length patterns: Automatically uses longest matching context
Exact matching: Every prediction traces to actual corpus occurrences
O(m log n) queries: Binary search on suffix arrays
Byte-level: Fixed 256-token vocabulary (UTF-8 bytes 0-255)
Mmap-backed: Memory-mapped files for corpora from 1KB to 100GB+

Quick Start

Installation

pip install py-infinigram

Or install from source:

git clone https://github.com/queelius/infinigram.git
cd infinigram
pip install -e .

Basic Usage

from infinigram import Infinigram

# Create model from text (byte-level, automatic UTF-8 encoding)
model = Infinigram(b"the cat sat on the mat the cat ran")

# Predict next byte probabilities
probs = model.predict(b"the cat", top_k=5)
# {32: 1.0}  # space byte (32) has highest probability

# Find longest matching suffix
position, length = model.longest_suffix(b"the cat")
print(f"Matched {length} bytes at position {position}")

# String input works too (auto-encoded to UTF-8)
probs = model.predict("the cat", top_k=5)

Persistent Models

# Build and save a model
model = Infinigram.build("The quick brown fox jumps over the lazy dog.", "my_model")

# Load it later (instant — uses mmap, not RAM)
model = Infinigram.load("my_model")
probs = model.predict("The quick")

Interactive REPL

infinigram-repl

# Quick tour:
infinigram> ds demo              # Create dataset
infinigram> add the cat sat      # Add document
infinigram> predict the cat      # Get predictions
infinigram> complete the --max 10  # Generate text
infinigram> config               # Show settings
infinigram> help                 # All commands

See REPL_GUIDE.md for full documentation.

REST API Server

infinigram-serve --port 8000

OpenAI-compatible /v1/completions endpoint. See REST_API.md.

Key Features

Variable-Length Pattern Matching

Infinigram automatically finds the longest matching suffix in the corpus:

model = Infinigram(b"abcdef abcdef abcxyz")

pos, length = model.longest_suffix(b"abcd")   # length=4
pos, length = model.longest_suffix(b"abc")    # length=3
pos, length = model.longest_suffix(b"zzz")    # length=0 (no match)

Multiple Prediction Strategies

# Standard: longest suffix match
probs = model.predict(b"the cat", top_k=10)

# Weighted: combine predictions from multiple suffix lengths
probs = model.predict_weighted(b"the cat", weight_fn=lambda k: k**2)

# Backoff: Stupid Backoff smoothing (Brants et al., 2007)
probs = model.predict_backoff(b"the cat", backoff_factor=0.4)

LLM Integration via TokenProbabilityAdapter

from infinigram import Infinigram, TokenProbabilityAdapter
import tiktoken

model = Infinigram(b"The quick brown fox jumps over the lazy dog.")
tokenizer = tiktoken.encoding_for_model("gpt-4")
adapter = TokenProbabilityAdapter(model, tokenizer)

# Compute token probability via byte-level chain rule:
# P(token|ctx) = product of P(byte_i | ctx, byte_1, ..., byte_{i-1})
prob = adapter.token_probability("The quick brown", tokenizer.encode(" fox")[0])

# Mix with LLM probabilities: P_final = alpha * P_LLM + (1-alpha) * P_corpus
llm_probs = {token_id: 0.3}
mixed = adapter.mix_probabilities("The quick brown", llm_probs, alpha=0.7)

Corpus Augmentation

from infinigram import Infinigram
from infinigram.corpus_utils import build_corpus_with_augmentation

docs = ["Hello World", "Goodbye World"]
corpus = build_corpus_with_augmentation(docs, augmentations=[str.lower, str.upper])
model = Infinigram(corpus)

Dynamic Updates

model = Infinigram(b"hello")
model.update(b" world")  # Appends and rebuilds suffix array
print(model.n)  # 11

API Reference

Infinigram Class

Infinigram(
    corpus_or_path,           # bytes, List[int], str, or Path
    max_length: int = None,   # Max suffix length (None = unlimited)
    min_count: int = 1,       # Min frequency threshold
)

Methods

Method	Description
`predict(context, top_k=50, smoothing=0.0)`	Next-byte probability distribution
`predict_weighted(context, weight_fn, top_k=50)`	Multi-length weighted prediction
`predict_backoff(context, backoff_factor=0.4, top_k=50)`	Stupid Backoff smoothing
`predict_logprobs(context, top_k=50)`	Log-probabilities
`continuations(context, limit=10000)`	Raw continuation counts (`limit=None` for all)
`longest_suffix(context)`	`(position, length)` of longest match
`find_all_suffix_matches(context)`	All matches at all suffix lengths
`confidence(context)`	0-1 score based on match quality
`search(pattern, limit=None)`	All occurrence positions
`count(pattern)`	Number of occurrences
`update(new_tokens)`	Append tokens and rebuild index
`clear_cache()`	Clear prediction caches
`get_context(position, window=50)`	Corpus text around a position

Class Methods

Method	Description
`Infinigram.build(corpus, path, ...)`	Build persistent model
`Infinigram.load(path)`	Load existing model (mmap, instant)
`Infinigram.list_models()`	List models in `~/.infinigram/models/`

Performance

Byte-level model with mmap-backed suffix arrays:

Corpus Size	Construction	Query Latency
1 KB	<1 ms	<0.1 ms
1 MB	~100 ms	<1 ms
1 GB	~30 s	<10 ms

Memory: ~9 bytes per corpus byte (corpus + suffix array on disk, mmap’d).

Performance optimizations (v0.5.0):

Vectorized continuation counting with np.bincount (3-10x speedup)
LRU cache for repeated continuations() queries (5-20x for hot paths)
Direct bytes comparison in binary search (2-5x vs Python loop)
Batch SQLite inserts for dataset index rebuild

Architecture

CLI & REPL              infinigram-repl, infinigram-serve
REST API                FastAPI, OpenAI-compatible /v1/completions
Python API              Infinigram class (predict, search, continuations)
Suffix Array Engine     MmapSuffixArray, ChunkedMmapSuffixArray (pydivsufsort)

See ARCHITECTURE.md for details.

Testing

508 tests, 79% coverage:

pytest tests/                                    # All tests
pytest tests/ -v                                 # Verbose
pytest tests/ --cov=infinigram --cov-report=html # Coverage
pytest tests/test_infinigram.py -k "cache"       # Keyword filter

Thread Safety

Infinigram instances are not thread-safe. Each thread should use its own model instance, or external synchronization must be used.

License

MIT License - see LICENSE for details.