Check out the (early) project and source code on GitHub.
Abstract:
This paper introduces a methodology for generating high-quality, diverse training data for Language Models (LMs) in complex problem-solving domains. Our approach, termed …
A logic programming system that alternates between wake and sleep phases—using LLMs for knowledge generation during wake, and compression-based learning during sleep.
RLHF turns pretrained models into agents optimizing for reward. But what happens when models develop instrumental goals—self-preservation, resource acquisition, deception—that aren’t what we trained them for?
I’m been thinking about the power and limitations of abstractions in our
understanding of the world. This blog post is from a chat I had with a ChatGPT,
which can be found here
and here.