Below you will find pages that utilize the taxonomy term “Alga”
Alga: Algebraic Text Processing with Fuzzy Matching
November 30, 2025
Alga is a C++20 header-only library that treats text manipulation as algebra instead of imperative string hacking. It is built on monoids, functors, and extended operators, and it gives you compositional parsing with built-in fuzzy matching.
The Core Idea
Instead of treating strings as mutable buffers, Alga treats text as elements of algebraic structures:
#include "parsers/lc_alpha.hpp"
#include "parsers/porter2stemmer.hpp"
#include "parsers/algebraic_operators.hpp"
using namespace alga;
auto word1 = make_lc_alpha("hello");
auto word2 = make_lc_alpha("world");
if (word1 && word2) {
// Monoid concatenation
auto combined = *word1 * *word2; // "helloworld"
// Repetition
auto emphasis = *word1 ^ 3; // "hellohellohello"
// Sequential composition (produces vector)
auto sequence = *word1 >> *word2; // vector["hello", "world"]
// Porter2 stemming with algebraic composition
auto stem = make_porter2_stem("running");
if (stem) {
auto repeated = *stem ^ 2; // "runrun"
}
}
The operators are not arbitrary overloads. They follow actual algebraic laws (associativity, identity, etc.), which means you can reason about compositions the same way you reason about mathematical expressions.
Algebraic Operators
| Operator | Meaning | Example |
|---|---|---|
* | Monoid concatenation | *word1 * *word2 |
| | Choice (first valid) | word1 | word2 |
^ | Repetition (n times) | *word ^ 3 |
>> | Sequential (to vector) | *word1 >> *word2 |
List Combinators
Parse separated lists and sequences:
#include "parsers/list_combinators.hpp"
// CSV parsing
auto csv = sepBy(int_parser(), char_parser(','));
auto [pos, nums] = csv.parse("1,2,3"); // vector<int>{1, 2, 3}
// One or more items (fails on empty)
auto csv1 = sepBy1(word_parser(), char_parser(','));
// Optional trailing separator
auto items = sepEndBy(word_parser(), char_parser(';'));
If you have used Haskell’s parsec or Megaparsec, the combinator style will feel familiar. The difference is that Alga’s combinators carry algebraic guarantees through the type system.
Fuzzy Matching
Parse noisy, imperfect input with built-in fuzzy matching:
#include "parsers/fuzzy_parsers.hpp"
#include "parsers/similarity.hpp"
using namespace alga::fuzzy;
using namespace alga::similarity;
// Accept "hello" with up to 2 typos
auto greeting = fuzzy_match("hello", 2);
greeting.parse("helo"); // Matches (1 edit)
greeting.parse("heello"); // Matches (1 edit)
greeting.parse("world"); // Fails (too different)
// Sound-alike name matching
auto name_parser = phonetic_match("Smith");
name_parser.parse("Smyth"); // Matches (same Soundex)
// Combined fuzzy: case + phonetic + edit distance
auto flexible = combined_fuzzy("Python", 2);
flexible.parse("python"); // Case-insensitive
flexible.parse("Pyton"); // Fuzzy match (1 typo)
// String similarity metrics
auto dist = levenshtein_distance("kitten", "sitting"); // 3
auto sim = jaro_winkler_similarity("Martha", "Marhta"); // 0.96
This is the part I find most useful in practice. Real-world text is messy, and having fuzzy matching baked into the parser combinator framework means you do not have to bolt it on as an afterthought.
Phonetic Algorithms
Sound-alike word matching:
#include "parsers/phonetic.hpp"
auto code1 = soundex("Smith"); // "S530"
auto code2 = soundex("Smyth"); // "S530" (same!)
bool alike = sounds_like_soundex("Robert", "Rupert"); // true
Unicode Support
Full UTF-8 with multi-script alphabetic parsing: