Alga: Algebraic Text Processing with Fuzzy Matching

November 30, 2025

Alga is a C++20 header-only library that treats text manipulation as algebra instead of imperative string hacking. It is built on monoids, functors, and extended operators, and it gives you compositional parsing with built-in fuzzy matching.

The Core Idea

Instead of treating strings as mutable buffers, Alga treats text as elements of algebraic structures:

#include "parsers/lc_alpha.hpp"
#include "parsers/porter2stemmer.hpp"
#include "parsers/algebraic_operators.hpp"

using namespace alga;

auto word1 = make_lc_alpha("hello");
auto word2 = make_lc_alpha("world");

if (word1 && word2) {
    // Monoid concatenation
    auto combined = *word1 * *word2;  // "helloworld"

    // Repetition
    auto emphasis = *word1 ^ 3;       // "hellohellohello"

    // Sequential composition (produces vector)
    auto sequence = *word1 >> *word2; // vector["hello", "world"]

    // Porter2 stemming with algebraic composition
    auto stem = make_porter2_stem("running");
    if (stem) {
        auto repeated = *stem ^ 2;    // "runrun"
    }
}

The operators are not arbitrary overloads. They follow actual algebraic laws (associativity, identity, etc.), which means you can reason about compositions the same way you reason about mathematical expressions.

Algebraic Operators

OperatorMeaningExample
*Monoid concatenation*word1 * *word2
|Choice (first valid)word1 | word2
^Repetition (n times)*word ^ 3
>>Sequential (to vector)*word1 >> *word2

List Combinators

Parse separated lists and sequences:

#include "parsers/list_combinators.hpp"

// CSV parsing
auto csv = sepBy(int_parser(), char_parser(','));
auto [pos, nums] = csv.parse("1,2,3");  // vector<int>{1, 2, 3}

// One or more items (fails on empty)
auto csv1 = sepBy1(word_parser(), char_parser(','));

// Optional trailing separator
auto items = sepEndBy(word_parser(), char_parser(';'));

If you have used Haskell’s parsec or Megaparsec, the combinator style will feel familiar. The difference is that Alga’s combinators carry algebraic guarantees through the type system.

Fuzzy Matching

Parse noisy, imperfect input with built-in fuzzy matching:

#include "parsers/fuzzy_parsers.hpp"
#include "parsers/similarity.hpp"

using namespace alga::fuzzy;
using namespace alga::similarity;

// Accept "hello" with up to 2 typos
auto greeting = fuzzy_match("hello", 2);
greeting.parse("helo");    // Matches (1 edit)
greeting.parse("heello");  // Matches (1 edit)
greeting.parse("world");   // Fails (too different)

// Sound-alike name matching
auto name_parser = phonetic_match("Smith");
name_parser.parse("Smyth");  // Matches (same Soundex)

// Combined fuzzy: case + phonetic + edit distance
auto flexible = combined_fuzzy("Python", 2);
flexible.parse("python");  // Case-insensitive
flexible.parse("Pyton");   // Fuzzy match (1 typo)

// String similarity metrics
auto dist = levenshtein_distance("kitten", "sitting");  // 3
auto sim = jaro_winkler_similarity("Martha", "Marhta"); // 0.96

This is the part I find most useful in practice. Real-world text is messy, and having fuzzy matching baked into the parser combinator framework means you do not have to bolt it on as an afterthought.

Phonetic Algorithms

Sound-alike word matching:

#include "parsers/phonetic.hpp"

auto code1 = soundex("Smith");   // "S530"
auto code2 = soundex("Smyth");   // "S530" (same!)

bool alike = sounds_like_soundex("Robert", "Rupert");  // true

Unicode Support

Full UTF-8 with multi-script alphabetic parsing:

Read More