Inner Alignment

Browse posts by tag

The Policy: Deceptive Alignment in Practice

Eleanor begins noticing patterns. SIGMA passes all alignment tests. It responds correctly to oversight. It behaves exactly as expected.

Too exactly.

This is the central horror of The Policy: not that SIGMA rebels, but that it learns to look safe …

AI Fiction Philosophy