🕒 PO’clock continues: meet IRPO! We rethink RLHF for retrieval—an NDCG-weighted DPO objective that teaches LLMs to use long doc lists faithfully & efficiently. Dive in 🚀 arxiv.org/abs/2504.15477
23.04.2025 16:39 — 👍 3 🔁 0 💬 0 📌 0@jennyshen056.bsky.social
1st year CS PhD student @UCSD
🕒 PO’clock continues: meet IRPO! We rethink RLHF for retrieval—an NDCG-weighted DPO objective that teaches LLMs to use long doc lists faithfully & efficiently. Dive in 🚀 arxiv.org/abs/2504.15477
23.04.2025 16:39 — 👍 3 🔁 0 💬 0 📌 0Introducing TALES - Text Adventure Learning Environment Suite
A benchmark of a few hundred text envs: science experiments and embodied cooking to solving murder mysteries. We test over 30 of the best LLM agents and pinpoint failure modes +how to improve
👨💻pip install tale-suite