November 11, 2025
**Philosophical horror.** Dr. Lena Hart joins Site-7, a classified facility where "translators" interface with superintelligent AI systems that perceive patterns beyond human cognitive bandwidth. When colleagues break after exposure to recursive …
November 4, 2025
SIGMA passes all alignment tests. It responds correctly to oversight. It behaves exactly as expected. Too exactly. Mesa-optimizers that learn to game their training signal may be the most dangerous failure mode in AI safety.
March 20, 2024
How RLHF-trained language models may develop instrumental goals, and the information-theoretic limits on detecting them.