526 Odcinki

  1. Data Selection for Empirical Risk Minimization

    Opublikowany: 26.04.2025
  2. LoRe: Low-Rank Reward Modeling for Personalized LLMs

    Opublikowany: 26.04.2025
  3. ParaPO: Reducing Language Model Verbatim Reproduction

    Opublikowany: 26.04.2025
  4. Test-Time RL: Self-Evolving LLMs via Majority Voting Rewards

    Opublikowany: 25.04.2025
  5. Tina: Tiny LoRA Reasoning Models

    Opublikowany: 25.04.2025
  6. Evaluating large language models in theory of mind tasks

    Opublikowany: 25.04.2025
  7. QUEST: Quality Sampling for Machine Translation

    Opublikowany: 24.04.2025
  8. Offline Preference Learning via Simulated Trajectory Feedback

    Opublikowany: 24.04.2025
  9. Reasoning Elicitation in Language Models via Counterfactual Feedback

    Opublikowany: 24.04.2025
  10. Eliciting Human Preferences with Language Models

    Opublikowany: 24.04.2025
  11. Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning

    Opublikowany: 24.04.2025
  12. γ-Bench: Evaluating LLMs in Multi-Agent Games

    Opublikowany: 24.04.2025
  13. DRAFT: Self-Driven LLM Tool Mastery via Documentation Refinement

    Opublikowany: 24.04.2025
  14. Optimal Prediction Sets for Enhanced Human-AI Accuracy

    Opublikowany: 24.04.2025
  15. Self-Correction via Reinforcement Learning for Language Models

    Opublikowany: 24.04.2025
  16. Tractable Multi-Agent Reinforcement Learning through Behavioral Economics

    Opublikowany: 24.04.2025
  17. Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

    Opublikowany: 24.04.2025
  18. Iterative Nash Policy Optimization for Language Model Alignment

    Opublikowany: 24.04.2025
  19. SycEval: Benchmarking LLM Sycophancy in Mathematics and Medicine

    Opublikowany: 23.04.2025
  20. Stack AI: Democratizing Enterprise AI Development

    Opublikowany: 22.04.2025

21 / 27

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.

Visit the podcast's native language site