New in @science.org : 20+ AI scholarsβinc. @alondra.bsky.social @randomwalker.bsky.social @sanmikoyejo.bsky.social et al, lay out a playbook for evidence-based AI governance.
Without solid data, we risk both hype and harm. Thread π
@sanmikoyejo.bsky.social
New in @science.org : 20+ AI scholarsβinc. @alondra.bsky.social @randomwalker.bsky.social @sanmikoyejo.bsky.social et al, lay out a playbook for evidence-based AI governance.
Without solid data, we risk both hype and harm. Thread π
Grateful to win Best Paper at ACL for our work on Fairness through Difference Awareness with my amazing collaborators!! Check out the paper for why we think fairness has both gone too far, and at the same time, not far enough aclanthology.org/2025.acl-lon...
30.07.2025 15:34 β π 26 π 4 π¬ 0 π 0Instead, we should permit differentiating based on the context. Ex: synagogues in America are legally allowed to discriminate by religion when hiring rabbis. Work with Michelle Phan, Daniel E. Ho, @sanmikoyejo.bsky.social arxiv.org/abs/2502.01926
02.06.2025 16:38 β π 1 π 1 π¬ 1 π 0Most major LLMs are trained using English data, making it ineffective for the approximately 5 billion people who don't speak English. Here, HAI Faculty Affiliate @sanmikoyejo.bsky.social discusses the risks of this digital divide and how to close it. hai.stanford.edu/news/closing...
20.05.2025 17:43 β π 4 π 1 π¬ 0 π 0Our latest white paper maps the landscape of large language model development for low-resource languages, highlighting challenges, trade-offs, and strategies. Read more here: hai.stanford.edu/policy/mind-...
23.04.2025 16:38 β π 6 π 1 π¬ 0 π 1Collaboration with a bunch of lovely people I am thankful to be able to work with: @hannawallach.bsky.social , @angelinawang.bsky.social , Olawale Salaudeen, Rishi Bommasani, and @sanmikoyejo.bsky.social. π€
16.04.2025 16:45 β π 3 π 1 π¬ 0 π 0π§βπ¬ Happy our article on Creating a Generative AI Evaluation Science, led by @weidingerlaura.bsky.social & @rajiinio.bsky.social, is now published by the National Academy of Engineering. =) www.nae.edu/338231/Towar...
Describes how to mature eval so systems can be worthy of trust and safely deployed.
π£Weβre thrilled to announce the first workshop on Technical AI Governance (TAIG) at #ICML2025 this July in Vancouver! Join us (& this stellar list of speakers) in bringing together technical & policy experts to shape the future of AI governance! www.taig-icml.com
01.04.2025 12:23 β π 14 π 4 π¬ 1 π 4AI systems present an opportunity to reflect society's biases. βHowever, realizing this potential requires careful attention to both technical and social considerations,β says HAI Faculty Affiliate @sanmikoyejo.bsky.social in his latest op-ed via @theguardian.com: www.theguardian.com/commentisfre...
26.03.2025 15:51 β π 9 π 5 π¬ 0 π 0Very excited we were able to get this collaboration working -- congrats and big thanks to the co-authors! @rajiinio.bsky.social @hannawallach.bsky.social @mmitchell.bsky.social @angelinawang.bsky.social Olawale Salaudeen, Rishi Bommasani @sanmikoyejo.bsky.social @williamis.bsky.social
20.03.2025 13:28 β π 5 π 1 π¬ 0 π 03) Institutions and norms are necessary for a long-lasting, rigorous and trusted evaluation regime. In the long run, nobody trusts actors correcting their own homework. Establishing an ecosystem that accounts for expertise and balances incentives is a key marker of robust evaluation in other fields.
20.03.2025 13:28 β π 3 π 2 π¬ 1 π 0which challenged concepts of what temperature is and in turn motivated the development of new thermometers. A similar virtuous cycle is needed to refine AI evaluation concepts and measurement methods.
20.03.2025 13:28 β π 2 π 1 π¬ 1 π 02) Metrics and evaluation methods need to be refined over time. This iteration is key to any science. Take the example of measuring temperature: it went through many iterations of building new measurement approaches,
20.03.2025 13:28 β π 2 π 1 π¬ 1 π 0Just like the βcrashworthinessβ of a car indicates aspects of safety in case of an accident, AI evaluation metrics need to link to real-world outcomes.
20.03.2025 13:28 β π 2 π 1 π¬ 1 π 0We identify three key lessons in particular.
1) Meaningful metrics: evaluation metrics must connect to AI system behaviour or impact that is of relevance in the real-world. They can be abstract or simplified -- but they need to correspond to real-world performance or outcomes in a meaningful way.
We pull out key lessons from other fields, such as aerospace, food security, and pharmaceuticals, that have matured from being research disciplines to becoming industries with widely used and trusted products. AI research is going through a similar maturation -- but AI evaluation needs to catch up.
20.03.2025 13:28 β π 2 π 1 π¬ 1 π 0Excited about this work framing out responsible governance for generative AI!
20.03.2025 21:21 β π 0 π 0 π¬ 0 π 0@koloskova.bsky.social is awesome, and you should apply to work with her if you can!
Also, Congrats!
Thanks @gdemelo.bsky.social for the kind words. I blame the students π
19.02.2025 02:35 β π 2 π 0 π¬ 0 π 0Can Symbolic Scaffolding and DPO Enhance Solution Quality and Accuracy in Mathematical Problem Solving with LLMs by Shree Reddy, Shubhra Mishra fine-tune the Qwen-2.5-7B-Instruct model with symbolic-enhanced traces, achieving improved performance on GSM8K and MathCAMPS benchmarks.
21.01.2025 04:13 β π 0 π 0 π¬ 0 π 0Heterogeneity of Preference Datasets for Pluralistic AI Alignment by Emily Bunnapradist, Niveditha Iyer, Megan Li, and Nikil Selvam propose objectively quantifying a dataset's diversity and evaluating preference datasets for pluralistic alignment.
n/n
HP-GS: Human-Preference Next Best View Selection for 3D Gaussian Splatting by Matt Strong and Aditya Dutt presents a simple method for guiding the next best view in 3D Gaussian Splatting.
github.com/peasant98/ac...
Video link: youtube.com/watch?v=t3gC....
6/n
Information-Theoretic Measures for LLM Output Evaluation by
Zachary Robertson, Suhana Bedi, and Hansol Lee explore using total variation mutual information to evaluate LLM-based preference learning
github.com/zrobertson46...
5/n
PrefLearn: How Do Advanced Replay Buffers and Online DPO Affect the Performance of RL Tetris with DQNs by Andy Liang, Abhinav Sinha, Jeremy Tian, and Kenny Dao proposes PrefLearn with superior performance and faster convergence
tinyurl.com/preflearn
4/n
Cost and Reward Infused Metric Elicitation by
Chethan Bhateja, Joseph O'Brien, Afnaan Hashmi, and Eva Prakash extend metric elicitation to consider additional factors like monetary cost and latency.
arxiv.org/abs/2501.00696
3/n
Improving Suggestions For Student Feedback Using Direct Preference Optimization by
@juliettewoodrow.bsky.social is a method for producing LLM-generated feedback suggestions that align with human preferences through an iterative fine-tuning process.
juliettewoodrow.github.io/paper-hostin...
2/n
π Incredible student projects from the 2024 Fall quarter's Machine Learning from Human Preferences course web.stanford.edu/class/cs329h/. Our students tackled some fascinating challenges at the intersection of AI alignment and human values. Selected project details follow...
1/n
As one of the vice chairs of the EU GPAI Code of Practice process, I co-wrote the second draft which just went online β feedback is open until mid-January, please let me know your thoughts, especially on the internal governance section!
digital-strategy.ec.europa.eu/en/library/s...
What happens when "If at first you don't succeed, try again?" meets modern ML/AI insights about scaling up?
You jailbreak every model on the marketπ±π±π±
Fire work led by @jplhughes.bsky.social
Sara Price @aengusl.bsky.social Mrinank Sharma
Ethan Perez
arxiv.org/abs/2412.03556
New paper on why machine "unlearning" is much harder than it seems is now up on arXiv: arxiv.org/abs/2412.06966 This was a huuuuuge cross-disciplinary effort led by @msftresearch.bsky.social FATE postdoc @grumpy-frog.bsky.social!!!
14.12.2024 00:55 β π 73 π 24 π¬ 2 π 0