Supervising Strong Learners by Amplifying Weak Experts

AI Safety Fundamentals: Alignment - Podcast autorstwa BlueDot Impact

Abstract: Many real world learning tasks involve complex or hard-to-specify objectives, and using an easier-to-specify proxy can lead to poor performance or misaligned behavior. One solution is to have humans provide a training signal by demonstrating or judging performance, but this approach fails if the task is too complicated for a human to directly evaluate. We propose Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficu...

Visit the podcast's native language site