active library

homeostat

Homeostatic intrinsic reward for language model alignment

Started 2026 Python

Resources & Distribution

Source Code

Package Registries

Homeostat: Homeostatic Intrinsic Reward for Language Model Alignment

Training language models using homeostatic intrinsic reward — a valence signal derived from the model’s own internal state and its perception of the user’s affective state — as an alternative to external human preference labels (RLHF). The model maintains homeostatic variables about both its own computational health (entropy, self-perplexity) and the user’s emotional state (sentiment, frustration, confusion), perceived through frozen linear probes on hidden activations. Sycophancy, incoherence, and disengagement all violate homeostatic setpoints, creating a structural drive toward honest and helpful behavior.

See RESEARCH-PROGRAM.md for the full research program, hypotheses, experimental design, risks, and theoretical context.

Quick Start

# Clone and install
git clone https://github.com/leonvanbokhorst/homeostat.git
cd homeostat
pip install -e ".[dev]"

# Run tests
pytest

# Lint and type check
ruff check src/ tests/
mypy src/

Project Structure

src/homeostat/
├── signals/       # Phase A: intrinsic signal extraction + affect probes + valence
├── training/      # Phase B: PPO and DPO training loops with homeostatic reward
├── bots/          # Training environment: perturbation bots for multi-turn episodes
└── evaluation/    # Phase C: sycophancy evals, ablation runner, result comparison

Key Concepts

  • Intrinsic signals: output entropy, self-perplexity, activation norms, repetition — monitor the model’s computational health
  • Relational signals: sentiment, curiosity, frustration, confusion, boredom — perceived via frozen linear probes on hidden states
  • Valence function: valence = -Σ wᵢ · |xᵢ - setpointᵢ| / toleranceᵢ — always ≤ 0, zero = perfect homeostasis
  • Anti-sycophancy mechanism: intrinsic signals counterbalance relational signals, preventing the model from sacrificing coherence for approval

Hardware

Designed for a single 12GB VRAM GPU. Primary model: Qwen2.5-0.5B with LoRA adapters.

Status

Project scaffolding. Phase A (signal validation) is next.


Evolved from exploratory work in AutoPoiesi.

Discussion