Best AI papers explained

Podcast autorstwa Enoch H. Kang

523 Odcinki

Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Opublikowany: 27.05.2025
Test-Time Reinforcement Learning (TTRL)
Opublikowany: 27.05.2025
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Opublikowany: 26.05.2025
Agentic Reward Modeling_Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Opublikowany: 26.05.2025
Beyond Reward Hacking: Causal Rewards for Large LanguageModel Alignment
Opublikowany: 26.05.2025
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
Opublikowany: 26.05.2025
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Opublikowany: 26.05.2025
UFT: Unifying Supervised and Reinforcement Fine-Tuning
Opublikowany: 26.05.2025
Understanding High-Dimensional Bayesian Optimization
Opublikowany: 26.05.2025
Inference time alignment in continuous space
Opublikowany: 25.05.2025
Efficient Test-Time Scaling via Self-Calibration
Opublikowany: 25.05.2025
Conformal Prediction via Bayesian Quadrature
Opublikowany: 25.05.2025
Predicting from Strings: Language Model Embeddings for Bayesian Optimization
Opublikowany: 25.05.2025
Self-Evolving Curriculum for LLM Reasoning
Opublikowany: 25.05.2025
Online Decision-Focused Learning in Dynamic Environments
Opublikowany: 25.05.2025
FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain
Opublikowany: 25.05.2025
Reward Shaping from Confounded Offline Data
Opublikowany: 25.05.2025
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Opublikowany: 25.05.2025
Understanding Best-of-N Language Model Alignment
Opublikowany: 25.05.2025
Maximizing Acquisition Functions for Bayesian Optimization - and its relation to Gradient Descent
Opublikowany: 24.05.2025

14 / 27

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.

Visit the podcast's native language site

523 Odcinki

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Test-Time Reinforcement Learning (TTRL)

Interpreting Emergent Planning in Model-Free Reinforcement Learning

Agentic Reward Modeling_Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

Beyond Reward Hacking: Causal Rewards for Large LanguageModel Alignment

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval

UFT: Unifying Supervised and Reinforcement Fine-Tuning

Understanding High-Dimensional Bayesian Optimization

Inference time alignment in continuous space

Efficient Test-Time Scaling via Self-Calibration

Conformal Prediction via Bayesian Quadrature

Predicting from Strings: Language Model Embeddings for Bayesian Optimization

Self-Evolving Curriculum for LLM Reasoning

Online Decision-Focused Learning in Dynamic Environments

FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain

Reward Shaping from Confounded Offline Data

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

Understanding Best-of-N Language Model Alignment

Maximizing Acquisition Functions for Bayesian Optimization - and its relation to Gradient Descent