alga
Resources & Distribution
Alga
A mathematically elegant C++ library for algebraic text processing and compositional parsing. Built on rigorous algebraic foundations with monoids, functors, and extended operators, Alga transforms text manipulation from imperative string processing into declarative mathematical expressions.
Key Features
- Mathematical Foundation: Built on rigorous algebraic structures (monoids, functors) with verified laws and properties
- High Performance: Header-only template library with zero-cost abstractions and C++17/20 support
- Type Safety: Compile-time guarantees through concepts, strong typing, and mathematical invariants
- Algebraic Operators: Extended operator set (
*,|,^,%,>>) with C++20 concepts for mathematical composition - Compositional Design: Simple parsers combine naturally into sophisticated text analysis pipelines
- Comprehensive Testing: 90%+ coverage with property-based testing of algebraic laws
- Production Ready: Includes industry-standard Porter2 stemmer and real-world examples
Quick Start
Prerequisites
- C++17 compatible compiler (GCC 7+, Clang 5+, MSVC 2017+)
- CMake 3.14+
- Optional: Google Test for running tests
Installation
git clone https://github.com/yourusername/alga.git
cd alga
mkdir build && cd build
cmake ..
make -j$(nproc)
# Run tests to verify installation
make test
Basic Example
#include "parsers/lc_alpha.hpp"
#include "parsers/porter2stemmer.hpp"
#include "parsers/algebraic_operators.hpp"
using namespace alga;
int main() {
// Algebraic text processing with mathematical elegance
auto word1 = make_lc_alpha("hello");
auto word2 = make_lc_alpha("world");
if (word1 && word2) {
// Mathematical composition operators
auto combined = *word1 * *word2; // Monoid concatenation: "helloworld"
auto choice = word1 | word2; // Choice operator: first valid
auto emphasis = *word1 ^ 3; // Repetition: "hellohellohello"
auto sequence = *word1 >> *word2; // Sequential: vector["hello", "world"]
// Function application with error handling
auto to_upper = [](auto w) {
std::string s = w.str();
std::transform(s.begin(), s.end(), s.begin(), ::toupper);
return s;
};
auto uppercase = word1 % to_upper; // Functional style: "HELLO"
// Porter2 stemming with algebraic composition
auto stem = make_porter2_stem("running");
if (stem) {
auto repeated = *stem ^ 2; // "runrun"
std::cout << "Stem repetition: " << std::string{repeated} << std::endl;
}
// Complex algebraic expressions
auto complex = ((word1 ^ 2) | (word2 ^ 3)) % [](auto w) { return w.size(); };
std::cout << "Complex expression result: " << *complex << std::endl;
}
return 0;
}
Mathematical Foundations
Algebraic Structures
The library implements several key mathematical concepts:
Free Monoids: The set of lowercase alphabetic strings forms a free monoid under concatenation
- Identity: Empty string
- Associativity: (a · b) · c = a · (b · c)
- Closure: Concatenation always produces valid strings
Functors: Structure-preserving transformations between categories
// fmap preserves composition: fmap(g ∘ f) = fmap(g) ∘ fmap(f) auto parser = make_word_parser(); auto length_parser = fmap(parser, [](const std::string& s) { return s.length(); });Monadic Composition: Error-handling composition with automatic failure propagation
auto result = make_lc_alpha("running") >>= [](const lc_alpha& word) { return stemmer(word); } >>= [](const porter2_stem& stem) { return make_unigram(stem); };
Verified Properties
All algebraic structures satisfy their mathematical laws:
// Monoid laws (automatically tested)
ASSERT_EQ((a * b) * c, a * (b * c)); // Associativity
ASSERT_EQ(empty * a, a); // Left identity
ASSERT_EQ(a * empty, a); // Right identity
// Functor laws (automatically tested)
ASSERT_EQ(fmap(id, parser), parser); // Identity preservation
ASSERT_EQ(fmap(compose(g,f), parser), fmap(g, fmap(f, parser))); // Composition
Core Components
Parser Types
lc_alpha: Type-safe lowercase alphabetic strings with monoid structureporter2_stemmer: Production-ready Porter2 stemming algorithm with algebraic interfacecombinatorial_parser: Composable parser combinators for building complex grammarsfsm_string_rewriter: Finite state machine-based string transformationsngram_stemmer: N-gram generation and composition with algebraic operations
Composition Examples
#include "examples/composition_examples.hpp"
// Multi-strategy processing with fallbacks
MultiStrategyStemmer stemmer;
auto result = stemmer(word); // Tries Porter2, falls back to alternatives
// Document processing pipeline
DocumentProcessor processor;
auto analysis = processor.process_document(text);
// Returns: word count, unique stems, n-gram statistics
// Parallel processing with value semantics
std::vector<std::future<std::optional<porter2_stem>>> futures;
for (const auto& word : words) {
futures.emplace_back(std::async(std::launch::async, [=]() {
return make_porter2_stem(word); // Safe to copy - no shared state
}));
}
Advanced Usage
Custom Parser Development
// Define algebraic parser with proper interface
template<typename Predicate, typename Transform>
class AlgebraicParser {
Predicate predicate;
Transform transformer;
public:
using input_type = std::string;
using output_type = std::invoke_result_t<Transform, std::string>;
auto operator()(const input_type& input) const -> std::optional<output_type> {
if (predicate(input)) {
return transformer(input);
}
return std::nullopt;
}
// Monadic composition
template<typename NextParser>
auto then(NextParser&& next) const {
return [*this, next = std::forward<NextParser>(next)](const input_type& input) {
auto result = operator()(input);
if (!result) return std::nullopt;
return next(*result);
};
}
};
// Usage
auto vowel_extractor = AlgebraicParser{
[](const std::string& s) { return !s.empty() && is_vowel(s[0]); },
[](const std::string& s) { return std::toupper(s[0]); }
};
Parser Combinators
#include "parsers/combinatorial_parser_fixed.hpp"
// Sequential composition
auto word_number = sequence(word_parser(), digit_parser());
// Choice composition (sum types)
auto word_or_number = alternative(word_parser(), digit_parser());
// Repetition (Kleene star)
auto many_words = many(word_parser());
// Transform results while preserving structure
auto word_lengths = transform(word_parser(), [](const std::string& s) {
return s.length();
});
Testing and Quality Assurance
Comprehensive Test Suite
The library includes extensive testing at multiple levels:
# Run all tests
make test
# Test coverage analysis
make coverage # Requires CMAKE_BUILD_TYPE=Coverage
# Performance benchmarks
make benchmark
Property-Based Testing
Mathematical properties are automatically verified:
// Test algebraic laws for all monoid types
TEST_F(AlgebraicProperties, MonoidLaws) {
// Tests run on randomly generated data
for (int i = 0; i < 1000; ++i) {
auto a = generate_random_lc_alpha();
auto b = generate_random_lc_alpha();
auto c = generate_random_lc_alpha();
ASSERT_EQ((a * b) * c, a * (b * c)); // Associativity
}
}
Real-World Integration Tests
TEST(Integration, ComplexDocumentProcessing) {
// Process multi-paragraph document
std::string document = load_test_document("lorem_ipsum.txt");
DocumentProcessor processor;
auto result = processor.process_document(document);
ASSERT_GT(result.word_count, 100);
ASSERT_GT(result.unique_stems.size(), 50);
ASSERT_FALSE(result.bigrams.empty());
}
Performance
- Zero-Cost Abstractions: Template metaprogramming eliminates runtime overhead
- Memory Efficient: RAII and move semantics, no unnecessary copying
- Parallelizable: Value semantics enable safe concurrent processing
- Optimized Algorithms: Porter2 implementation tuned for performance
Benchmarks show comparable performance to hand-optimized C code while providing mathematical guarantees and composability.
Architecture
include/parsers/ # Core algebraic parser types
├── lc_alpha.hpp # Free monoid of lowercase strings
├── porter2stemmer.hpp # Porter2 algorithm with algebraic interface
├── combinatorial_parser_fixed.hpp # Parser combinators
├── ngram_stemmer.hpp # N-gram composition
├── fsm_string_rewriter.hpp # Finite state transformations
└── word_parser.hpp # Word tokenization
examples/ # Real-world composition patterns
└── composition_examples.hpp # Document processing pipelines
test/ # Comprehensive test suite
├── composition_test.cpp # Compositional property tests
├── comprehensive_test_suite.cpp # Integration tests
└── performance_benchmark.cpp # Performance validation
Contributing
We welcome contributions that maintain the mathematical rigor and performance characteristics:
- Mathematical Correctness: All algebraic structures must satisfy their laws
- Type Safety: Use concepts and static assertions to prevent invalid usage
- Performance: Maintain zero-cost abstraction principles
- Testing: Include property-based tests for algebraic laws
- Documentation: Provide clear mathematical explanations
See CONTRIBUTING.md for detailed guidelines.
License
This project is licensed under the GNU General Public License v2.0 - see the LICENSE file for details.
Acknowledgments
- Inspired by Haskell’s parser combinator libraries and category theory
- Built using modern C++ best practices and mathematical foundations
- Implements classical algorithms (Porter2) within a principled algebraic framework
Mathematical parsing meets practical performance. This library demonstrates that rigorous mathematical foundations enhance rather than compromise real-world usability and performance.