nice....
15.03.2025 12:47 β π 1 π 0 π¬ 0 π 0
I was lucky enough to be invited give a talk on our new paper on the value of RL in fine-tuning at Cornell last week! Because of my poor time management skills, the talk isn't as polished as I'd like, but I think the "vibes" are accurate enough to share: youtu.be/E4b3cSirpsg.
06.03.2025 18:19 β π 15 π 3 π¬ 0 π 0
1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to π€Ώ:
04.03.2025 20:59 β π 59 π 11 π¬ 1 π 3
can you present other people's results :-)
04.03.2025 14:18 β π 1 π 0 π¬ 0 π 0
that makes sense to me.... i should go to bed....
06.02.2025 00:51 β π 3 π 0 π¬ 0 π 0
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback. Our approach is minimalist in that it does not require training a reward model nor unst...
@gswamy.bsky.social et al propose SPO which builds a game from a preferences, solving for the minimax winner. Handles non-Markovian, intransitive, and stochastic preferences. Nice empirical eval ranging from small demonstrative domains to huge RL domain (Mujoco).
arxiv.org/abs/2401.04056
2/3.
21.11.2024 12:30 β π 17 π 2 π¬ 2 π 0
I have become a fan of the game-theoretic approaches to RLHF, so here are two more papers in that category! (with one more tomorrow π
)
1. Self-Play Preference Optimization (SPO).
2. Direct Nash Optimization (DNO).
π§΅ 1/3.
21.11.2024 12:30 β π 74 π 9 π¬ 2 π 2
1....
21.11.2024 00:40 β π 4 π 0 π¬ 0 π 0
MIT postdoc, incoming UIUC CS prof
katedonahue.me
CyLab is @cmu.edu's Security & Privacy Institute. Our 300+ researchers are passionate about creating a world in which technology can be trusted. Follow our latest research at https://www.cylab.cmu.edu/.
Machine Learning Professor
https://cims.nyu.edu/~andrewgw
Workforce Economist in Residence at Guild; Senior Fellow at the Burning Glass Institute. I tweet a lot about labor markets, macro, and (sorry) music! Tweets represent my own views.
Exploring the intersection of global health metrics, epidemiology, and data science. Bridging the gap between methods and practice to better measure and improve population health worldwide.
Professor of Marketing at NYU Stern School of Business, serial entrepreneur, and host of the Prof G and Pivot Podcasts.
PhD student @ CMU HCII
I build systems that empower people to shape AI through collaborative, deliberative, and democratic processes.
https://tskuo.github.io
San Diego Dec 2-7, 25 and Mexico City Nov 30-Dec 5, 25. Comments to this account are not monitored. Please send feedback to townhall@neurips.cc.
Research in generative AI for **human** creativity in music + more.
Assistant professor at CMU CSD, leading the πΌ G-CLef lab. Part time research scientist at Google DeepMind on the Magenta team (views my own)
I work on human-centered {security|privacy|computing}. Associate Professor (w/o tenure) at @hcii.cmu.edu. Director of the SPUD (Security, Privacy, Usability, and Design) Lab. Non-Resident Fellow @cendemtech.bsky.social
The School of Computer Science at Carnegie Mellon University is one of the world's premier institutions for CS and robotics research and education. We build useful stuff that works!
Professor a NYU; Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.
http://yann.lecun.com
The world's leading venue for collaborative research in theoretical computer science. Follow us at http://YouTube.com/SimonsInstitute.
AI Policy Fellow @ Princeton | PhD Carnegie Mellon | privacy, accountability, & algorithmic systems
Aligning incentives for better science - quality over status
Signer of DORA: https://sfdora.org/
Co-director of ENDOW project: https://endowproject.github.io/
Interdisciplinary socio-ecological scientist advocating for congruence of theory, data, and stats
AI & Transportation | MIT Associate Professor
Interests: AI for good, sociotechnical systems, machine learning, optimization, reinforcement learning, public policy, gov tech, open science.
Science is messy and beautiful.
http://www.wucathy.com
Blog: https://argmin.substack.com/
Webpage: https://people.eecs.berkeley.edu/~brecht/
PhD Student at Carnegie Mellon University.
HCI + Mental Health.
President of Signal, Chief Advisor to AI Now Institute
Humanβcomputer interaction researcher. PhD from University of Minnesota. Tacoma, WA. Mastodon: zwlevonian@hci.social