active library

Infinigram

Variable-length n-gram language models using suffix arrays.

Started 2025 HTML

Resources & Distribution

Package Registries

2 Stars

Infinigram: Variable-Length N-gram Language Models

Python 3.8+ Tests Coverage License

Efficient variable-length n-gram language models using suffix arrays for O(m log n) pattern matching. Byte-level (UTF-8), works with any text.

Overview

Infinigram is a corpus-based language model where the corpus IS the model — no gradient descent, no training loop. It uses suffix arrays to find variable-length patterns and predict continuations.

Key properties:

  • Instant training: Build a model by pointing at text
  • Variable-length patterns: Automatically uses longest matching context
  • Exact matching: Every prediction traces to actual corpus occurrences
  • O(m log n) queries: Binary search on suffix arrays
  • Byte-level: Fixed 256-token vocabulary (UTF-8 bytes 0-255)
  • Mmap-backed: Memory-mapped files for corpora from 1KB to 100GB+

Quick Start

Installation

pip install py-infinigram

Or install from source:

git clone https://github.com/queelius/infinigram.git
cd infinigram
pip install -e .

Basic Usage

from infinigram import Infinigram

# Create model from text (byte-level, automatic UTF-8 encoding)
model = Infinigram(b"the cat sat on the mat the cat ran")

# Predict next byte probabilities
probs = model.predict(b"the cat", top_k=5)
# {32: 1.0}  # space byte (32) has highest probability

# Find longest matching suffix
position, length = model.longest_suffix(b"the cat")
print(f"Matched {length} bytes at position {position}")

# String input works too (auto-encoded to UTF-8)
probs = model.predict("the cat", top_k=5)

Persistent Models

# Build and save a model
model = Infinigram.build("The quick brown fox jumps over the lazy dog.", "my_model")

# Load it later (instant — uses mmap, not RAM)
model = Infinigram.load("my_model")
probs = model.predict("The quick")

Interactive REPL

infinigram-repl

# Quick tour:
infinigram> ds demo              # Create dataset
infinigram> add the cat sat      # Add document
infinigram> predict the cat      # Get predictions
infinigram> complete the --max 10  # Generate text
infinigram> config               # Show settings
infinigram> help                 # All commands

See REPL_GUIDE.md for full documentation.

REST API Server

infinigram-serve --port 8000

OpenAI-compatible /v1/completions endpoint. See REST_API.md.

Key Features

Variable-Length Pattern Matching

Infinigram automatically finds the longest matching suffix in the corpus:

model = Infinigram(b"abcdef abcdef abcxyz")

pos, length = model.longest_suffix(b"abcd")   # length=4
pos, length = model.longest_suffix(b"abc")    # length=3
pos, length = model.longest_suffix(b"zzz")    # length=0 (no match)

Multiple Prediction Strategies

# Standard: longest suffix match
probs = model.predict(b"the cat", top_k=10)

# Weighted: combine predictions from multiple suffix lengths
probs = model.predict_weighted(b"the cat", weight_fn=lambda k: k**2)

# Backoff: Stupid Backoff smoothing (Brants et al., 2007)
probs = model.predict_backoff(b"the cat", backoff_factor=0.4)

LLM Integration via TokenProbabilityAdapter

from infinigram import Infinigram, TokenProbabilityAdapter
import tiktoken

model = Infinigram(b"The quick brown fox jumps over the lazy dog.")
tokenizer = tiktoken.encoding_for_model("gpt-4")
adapter = TokenProbabilityAdapter(model, tokenizer)

# Compute token probability via byte-level chain rule:
# P(token|ctx) = product of P(byte_i | ctx, byte_1, ..., byte_{i-1})
prob = adapter.token_probability("The quick brown", tokenizer.encode(" fox")[0])

# Mix with LLM probabilities: P_final = alpha * P_LLM + (1-alpha) * P_corpus
llm_probs = {token_id: 0.3}
mixed = adapter.mix_probabilities("The quick brown", llm_probs, alpha=0.7)

Corpus Augmentation

from infinigram import Infinigram
from infinigram.corpus_utils import build_corpus_with_augmentation

docs = ["Hello World", "Goodbye World"]
corpus = build_corpus_with_augmentation(docs, augmentations=[str.lower, str.upper])
model = Infinigram(corpus)

Dynamic Updates

model = Infinigram(b"hello")
model.update(b" world")  # Appends and rebuilds suffix array
print(model.n)  # 11

API Reference

Infinigram Class

Infinigram(
    corpus_or_path,           # bytes, List[int], str, or Path
    max_length: int = None,   # Max suffix length (None = unlimited)
    min_count: int = 1,       # Min frequency threshold
)

Methods

MethodDescription
predict(context, top_k=50, smoothing=0.0)Next-byte probability distribution
predict_weighted(context, weight_fn, top_k=50)Multi-length weighted prediction
predict_backoff(context, backoff_factor=0.4, top_k=50)Stupid Backoff smoothing
predict_logprobs(context, top_k=50)Log-probabilities
continuations(context, limit=10000)Raw continuation counts (limit=None for all)
longest_suffix(context)(position, length) of longest match
find_all_suffix_matches(context)All matches at all suffix lengths
confidence(context)0-1 score based on match quality
search(pattern, limit=None)All occurrence positions
count(pattern)Number of occurrences
update(new_tokens)Append tokens and rebuild index
clear_cache()Clear prediction caches
get_context(position, window=50)Corpus text around a position

Class Methods

MethodDescription
Infinigram.build(corpus, path, ...)Build persistent model
Infinigram.load(path)Load existing model (mmap, instant)
Infinigram.list_models()List models in ~/.infinigram/models/

Performance

Byte-level model with mmap-backed suffix arrays:

Corpus SizeConstructionQuery Latency
1 KB<1 ms<0.1 ms
1 MB~100 ms<1 ms
1 GB~30 s<10 ms

Memory: ~9 bytes per corpus byte (corpus + suffix array on disk, mmap’d).

Performance optimizations (v0.5.0):

  • Vectorized continuation counting with np.bincount (3-10x speedup)
  • LRU cache for repeated continuations() queries (5-20x for hot paths)
  • Direct bytes comparison in binary search (2-5x vs Python loop)
  • Batch SQLite inserts for dataset index rebuild

Architecture

CLI & REPL              infinigram-repl, infinigram-serve
REST API                FastAPI, OpenAI-compatible /v1/completions
Python API              Infinigram class (predict, search, continuations)
Suffix Array Engine     MmapSuffixArray, ChunkedMmapSuffixArray (pydivsufsort)

See ARCHITECTURE.md for details.

Testing

508 tests, 79% coverage:

pytest tests/                                    # All tests
pytest tests/ -v                                 # Verbose
pytest tests/ --cov=infinigram --cov-report=html # Coverage
pytest tests/test_infinigram.py -k "cache"       # Keyword filter

Thread Safety

Infinigram instances are not thread-safe. Each thread should use its own model instance, or external synchronization must be used.

License

MIT License - see LICENSE for details.

Discussion