Maph | metafunctor

maph: Maps Based on Perfect Hashing for Sub-Microsecond Key-Value Storage

June 10, 2024

maph is a key-value store that gets sub-100 nanosecond median lookup latency. The basic idea: memory-map the entire database file, use perfect hashing to locate keys in a single probe, and do everything lock-free with atomic operations. No kernel transitions on the read path. No copying. No locking.

The problem

Key-value stores hit three walls when you need microsecond-level latency:

Kernel overhead. System calls cost 100-500ns per operation just for the context switch.
Memory copying. Traditional stores copy data from kernel buffers to user space, between internal structures, for serialization. Each copy costs time.
Synchronization. Lock-based concurrency creates contention and unpredictable tail latency.

For most applications, these costs are noise. For things like feature stores in ML inference pipelines, or anything where you’re doing thousands of lookups per request within a tight latency budget, they dominate.

Three techniques

1. Zero-copy via mmap

Memory-map the database file and let the CPU’s MMU handle address translation:

// Traditional approach: Multiple copies
std::vector<uint8_t> data = read_from_disk(key);  // Kernel -> user copy
Value v = deserialize(data);                      // Another copy
process(v);                                        // Yet another copy

// maph approach: Direct memory access
auto store = maph::Maph::open("mystore.maph");
auto value = store->get(key);  // Zero copies. Direct pointer into mmap region

The kernel page cache handles persistence automatically. You get the illusion of in-memory access with durability.

2. Hybrid hash architecture

maph uses a hybrid hasher that combines perfect hashing with standard hashing. When you optimize (via the /optimize REST endpoint or optimize() in C++), it constructs a minimal perfect hash function for all current keys:

\[ \forall k_i, k_j \in K_{\text{known}}, i \neq j : h_p(k_i) \neq h_p(k_j) \]

Known keys get O(1) worst-case lookup with exactly one memory access. No probing. Keys inserted after optimization fall back to FNV-1a with linear probing:

\[ \text{slot}_i = (h_s(k) + i) \bmod n, \quad i \in [0, \text{MAX\_PROBES} - 1] \]

The maximum probe distance (default 10) bounds worst-case latency. Both hash paths use the same slot array, so there’s no static partitioning. The hybrid hasher checks whether a key was in the original optimized set and dispatches accordingly.

3. Lock-free atomic operations

Every operation uses compare-and-swap and atomic versioning. Each slot has a 64-bit atomic value: key hash in the upper 32 bits, version counter in the lower 32.