@sanmikoyejo - Bluesky Profile

New in @science.org : 20+ AI scholars—inc. @alondra.bsky.social @randomwalker.bsky.social @sanmikoyejo.bsky.social et al, lay out a playbook for evidence-based AI governance.

Without solid data, we risk both hype and harm. Thread 👇

05.08.2025 16:48 — 👍 22 🔁 18 💬 2 📌 1

Grateful to win Best Paper at ACL for our work on Fairness through Difference Awareness with my amazing collaborators!! Check out the paper for why we think fairness has both gone too far, and at the same time, not far enough aclanthology.org/2025.acl-lon...

30.07.2025 15:34 — 👍 26 🔁 4 💬 0 📌 0

Instead, we should permit differentiating based on the context. Ex: synagogues in America are legally allowed to discriminate by religion when hiring rabbis. Work with Michelle Phan, Daniel E. Ho, @sanmikoyejo.bsky.social arxiv.org/abs/2502.01926

02.06.2025 16:38 — 👍 1 🔁 1 💬 1 📌 0

Closing the Digital Divide in AI | Stanford HAI Large language models aren't effective for many languages. Scholars explain what's at stake for the approximately 5 billion people who don't speak English.

Most major LLMs are trained using English data, making it ineffective for the approximately 5 billion people who don't speak English. Here, HAI Faculty Affiliate @sanmikoyejo.bsky.social discusses the risks of this digital divide and how to close it. hai.stanford.edu/news/closing...

20.05.2025 17:43 — 👍 4 🔁 1 💬 0 📌 0

Mind the (Language) Gap: Mapping the Challenges of LLM Development in Low-Resource Language Contexts | Stanford HAI In collaboration with The Asia Foundation and the University of Pretoria, this white paper maps the LLM development landscape for low-resource languages, highlighting challenges, trade-offs, and strat...

Our latest white paper maps the landscape of large language model development for low-resource languages, highlighting challenges, trade-offs, and strategies. Read more here: hai.stanford.edu/policy/mind-...

23.04.2025 16:38 — 👍 6 🔁 1 💬 0 📌 1

Collaboration with a bunch of lovely people I am thankful to be able to work with: @hannawallach.bsky.social , @angelinawang.bsky.social , Olawale Salaudeen, Rishi Bommasani, and @sanmikoyejo.bsky.social. 🤗

16.04.2025 16:45 — 👍 3 🔁 1 💬 0 📌 0

Toward an Evaluation Science for Generative AI Systems There is an urgent need for a more robust and comprehensive approach to AI evaluation. There is an increasing imperative to anticipate and understand ...

🧑‍🔬 Happy our article on Creating a Generative AI Evaluation Science, led by @weidingerlaura.bsky.social & @rajiinio.bsky.social, is now published by the National Academy of Engineering. =) www.nae.edu/338231/Towar...
Describes how to mature eval so systems can be worthy of trust and safely deployed.

16.04.2025 16:43 — 👍 41 🔁 7 💬 1 📌 0

📣We’re thrilled to announce the first workshop on Technical AI Governance (TAIG) at #ICML2025 this July in Vancouver! Join us (& this stellar list of speakers) in bringing together technical & policy experts to shape the future of AI governance! www.taig-icml.com

01.04.2025 12:23 — 👍 14 🔁 4 💬 1 📌 4

Could AI help us build a more racially just society? | Sanmi Koyejo We have an opportunity to build systems that don’t just replicate our current inequities. Will we take them?

AI systems present an opportunity to reflect society's biases. “However, realizing this potential requires careful attention to both technical and social considerations,” says HAI Faculty Affiliate @sanmikoyejo.bsky.social in his latest op-ed via @theguardian.com: www.theguardian.com/commentisfre...

26.03.2025 15:51 — 👍 9 🔁 5 💬 0 📌 0

Very excited we were able to get this collaboration working -- congrats and big thanks to the co-authors! @rajiinio.bsky.social @hannawallach.bsky.social @mmitchell.bsky.social @angelinawang.bsky.social Olawale Salaudeen, Rishi Bommasani @sanmikoyejo.bsky.social @williamis.bsky.social

20.03.2025 13:28 — 👍 5 🔁 1 💬 0 📌 0

3) Institutions and norms are necessary for a long-lasting, rigorous and trusted evaluation regime. In the long run, nobody trusts actors correcting their own homework. Establishing an ecosystem that accounts for expertise and balances incentives is a key marker of robust evaluation in other fields.

20.03.2025 13:28 — 👍 3 🔁 2 💬 1 📌 0

which challenged concepts of what temperature is and in turn motivated the development of new thermometers. A similar virtuous cycle is needed to refine AI evaluation concepts and measurement methods.

20.03.2025 13:28 — 👍 2 🔁 1 💬 1 📌 0

2) Metrics and evaluation methods need to be refined over time. This iteration is key to any science. Take the example of measuring temperature: it went through many iterations of building new measurement approaches,

20.03.2025 13:28 — 👍 2 🔁 1 💬 1 📌 0

Just like the “crashworthiness” of a car indicates aspects of safety in case of an accident, AI evaluation metrics need to link to real-world outcomes.

20.03.2025 13:28 — 👍 2 🔁 1 💬 1 📌 0

We identify three key lessons in particular.

1) Meaningful metrics: evaluation metrics must connect to AI system behaviour or impact that is of relevance in the real-world. They can be abstract or simplified -- but they need to correspond to real-world performance or outcomes in a meaningful way.

20.03.2025 13:28 — 👍 2 🔁 1 💬 1 📌 0

We pull out key lessons from other fields, such as aerospace, food security, and pharmaceuticals, that have matured from being research disciplines to becoming industries with widely used and trusted products. AI research is going through a similar maturation -- but AI evaluation needs to catch up.

20.03.2025 13:28 — 👍 2 🔁 1 💬 1 📌 0

Excited about this work framing out responsible governance for generative AI!

20.03.2025 21:21 — 👍 0 🔁 0 💬 0 📌 0

@koloskova.bsky.social is awesome, and you should apply to work with her if you can!

Also, Congrats!

07.03.2025 03:11 — 👍 2 🔁 0 💬 0 📌 0

Thanks @gdemelo.bsky.social for the kind words. I blame the students 😃

19.02.2025 02:35 — 👍 2 🔁 0 💬 0 📌 0

Can Symbolic Scaffolding and DPO Enhance Solution Quality and Accuracy in Mathematical Problem Solving with LLMs by Shree Reddy, Shubhra Mishra fine-tune the Qwen-2.5-7B-Instruct model with symbolic-enhanced traces, achieving improved performance on GSM8K and MathCAMPS benchmarks.

21.01.2025 04:13 — 👍 0 🔁 0 💬 0 📌 0

Heterogeneity of Preference Datasets for Pluralistic AI Alignment by Emily Bunnapradist, Niveditha Iyer, Megan Li, and Nikil Selvam propose objectively quantifying a dataset's diversity and evaluating preference datasets for pluralistic alignment.
n/n

12.01.2025 04:55 — 👍 3 🔁 0 💬 1 📌 0

HP-GS: Human-Preference Next Best View Selection for 3D Gaussian Splatting by Matt Strong and Aditya Dutt presents a simple method for guiding the next best view in 3D Gaussian Splatting.
github.com/peasant98/ac...
Video link: youtube.com/watch?v=t3gC....
6/n

12.01.2025 04:55 — 👍 1 🔁 0 💬 1 📌 0

GitHub - zrobertson466920/CS329_Project Contribute to zrobertson466920/CS329_Project development by creating an account on GitHub.

Information-Theoretic Measures for LLM Output Evaluation by
Zachary Robertson, Suhana Bedi, and Hansol Lee explore using total variation mutual information to evaluate LLM-based preference learning
github.com/zrobertson46...
5/n

12.01.2025 04:55 — 👍 1 🔁 0 💬 1 📌 0

CS329H_Final_Paper.pdf

PrefLearn: How Do Advanced Replay Buffers and Online DPO Affect the Performance of RL Tetris with DQNs by Andy Liang, Abhinav Sinha, Jeremy Tian, and Kenny Dao proposes PrefLearn with superior performance and faster convergence
tinyurl.com/preflearn
4/n

12.01.2025 04:55 — 👍 0 🔁 0 💬 1 📌 0

Cost and Reward Infused Metric Elicitation In machine learning, metric elicitation refers to the selection of performance metrics that best reflect an individual's implicit preferences for a given application. Currently, metric elicitation met...

Cost and Reward Infused Metric Elicitation by
Chethan Bhateja, Joseph O'Brien, Afnaan Hashmi, and Eva Prakash extend metric elicitation to consider additional factors like monetary cost and latency.
arxiv.org/abs/2501.00696
3/n

12.01.2025 04:55 — 👍 0 🔁 0 💬 1 📌 0

Improving Suggestions For Student Feedback Using Direct Preference Optimization by
@juliettewoodrow.bsky.social is a method for producing LLM-generated feedback suggestions that align with human preferences through an iterative fine-tuning process.

juliettewoodrow.github.io/paper-hostin...
2/n

12.01.2025 04:55 — 👍 1 🔁 1 💬 1 📌 0

CS329H: Machine Learning from Human Preferences Machine Learning from Human Preferences

📚 Incredible student projects from the 2024 Fall quarter's Machine Learning from Human Preferences course web.stanford.edu/class/cs329h/. Our students tackled some fascinating challenges at the intersection of AI alignment and human values. Selected project details follow...

1/n

12.01.2025 04:55 — 👍 4 🔁 1 💬 1 📌 0

Second Draft of the General-Purpose AI Code of Practice published, written by independent experts Independent experts present the second draft of the General-Purpose AI Code of Practice, based on the feedback received on the first draft, published on 14 November 2024.

As one of the vice chairs of the EU GPAI Code of Practice process, I co-wrote the second draft which just went online – feedback is open until mid-January, please let me know your thoughts, especially on the internal governance section!

digital-strategy.ec.europa.eu/en/library/s...

19.12.2024 16:59 — 👍 14 🔁 5 💬 0 📌 1

What happens when "If at first you don't succeed, try again?" meets modern ML/AI insights about scaling up?

You jailbreak every model on the market😱😱😱

Fire work led by @jplhughes.bsky.social
Sara Price @aengusl.bsky.social Mrinank Sharma
Ethan Perez

arxiv.org/abs/2412.03556

13.12.2024 16:51 — 👍 3 🔁 2 💬 0 📌 0

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy. ...

New paper on why machine "unlearning" is much harder than it seems is now up on arXiv: arxiv.org/abs/2412.06966 This was a huuuuuge cross-disciplinary effort led by @msftresearch.bsky.social FATE postdoc @grumpy-frog.bsky.social!!!

14.12.2024 00:55 — 👍 73 🔁 24 💬 2 📌 0

Latest posts by sanmikoyejo.bsky.social on Bluesky

@sanmikoyejo is following 20 prominent accounts