Glad to hear that! Let me know if you have any feedback or thoughts :)
15.10.2025 18:22 β π 3 π 0 π¬ 0 π 0@mlamparth.bsky.social
Research Fellow @ Stanford Intelligent Systems Laboratory and Hoover Institution at Stanford University | Focusing on interpretable, safe, and ethical AI/LLM decision-making. Ph.D. from TUM.
Glad to hear that! Let me know if you have any feedback or thoughts :)
15.10.2025 18:22 β π 3 π 0 π¬ 0 π 0Iβm deeply grateful for the opportunity to work at the intersection of AI safety, security, and broader impacts. Iβd love to connect if you are interest in any of these topics or if our work overlaps!
15.10.2025 15:48 β π 0 π 0 π¬ 0 π 0I will also stay affiliated with the Stanford Center for AI Safety to continue teaching CS120 Introduction to AI Safety in Fall quarters at Stanford and we're excited to host a new course CS132 AI as Technology Accelerator in Spring through the TPA!
15.10.2025 15:48 β π 0 π 0 π¬ 1 π 0Through the Hoover Institutionβs Tech Policy Accelerator (TPA), led by Prof. Amy Zegart, Iβm working to bridge the gap between technical research and policy by translating technical insights and fostering dialogue with decision-makers on how to ensure AI is used securely and responsibly.
15.10.2025 15:48 β π 0 π 0 π¬ 1 π 0At SISL, under the guidance of Prof. Mykel Kochenderfer, Iβll be continuing my research on making AI models inherently more secure and safe, with projects focusing on automated red teaming, learning robust reward models, and model interpretability.
15.10.2025 15:48 β π 1 π 0 π¬ 1 π 0New job update! Iβm excited to share that Iβve joined the Hoover Institution and the Stanford Intelligent Systems Laboratory (SISL) in the Stanford University School of Engineering as a Research Fellow, starting September 1st.
15.10.2025 15:48 β π 3 π 0 π¬ 1 π 0ICYMI: The 2025 SERI Symposium explored the risks that emerge from the intersection of complex global challenges & policies designed to mitigate them, bringing together leading experts & researchers from across the Bay Area who specialize in a range of global risks
www.youtube.com/watch?v=wF20...
In their latest blog post for Stanford AI Lab, CISAC Postdoc @mlamparth.bsky.social and colleague Declan Grabb dive into MENTAT, a clinician-annotated dataset tackling real-world ambiguities in psychiatric decision-making.
ai.stanford.edu/blog/mentat/
That sounds familiar. Thank you for sharing :)
04.04.2025 23:05 β π 2 π 0 π¬ 0 π 0Did you add anything to that query or is this the output for just that prompt? π
04.04.2025 22:27 β π 0 π 0 π¬ 1 π 0Thank Stanford AI Lab for featuring our work in a new blog post!
We created a dataset that goes beyond medical exam-style questions and studies the impact of patient demographic on clinical decision-making in psychiatric care on fifteen language models
ai.stanford.edu/blog/mentat/
The Helpful, Honest, and Harmless (HHH) principle is key for AI alignment, but current interpretations miss contextual nuances. CISAC postdoc @mlamparth.bsky.social & colleagues propose an adaptive framework to prioritize values, balance trade-offs, and enhance AI ethics.
arxiv.org/abs/2502.06059
Thank you for your support! In the short term, we hope to provide an evaluation data set for the community, because there is no existing equivalent at the moment, and highlight some issues. In the long term, we want to motivate extensive studies to enable oversight tools for responsible deployment.
26.02.2025 18:21 β π 0 π 0 π¬ 0 π 0Supported through @stanfordmedicine.bsky.social, Stanford Center for AI Safety,
@stanfordhai.bsky.social, @fsi.stanford.edu , @stanfordcisac.bsky.social StanfordBrainstorm
#AISafety #ResponsibleAI #MentalHealth #Psychiatry #LLM
9/ Great collaboration with
Declan Grabb, Amy Franks, Scott Gershan, Kaitlyn Kunstman, Aaron Lulla, Monika Drummond Roots, Manu Sharma, Aryan Shrivasta, Nina Vasan, Colleen Waickman
8/ MENTAT is open-source.
Weβre making it available to the community to push AI research beyond test-taking and toward real clinical reasoning with dedicated eval questions and 20 designed questions for few-shot prompting or similar approaches.
Paper arxiv.org/abs/2502.16051
7/ High scores on multiple choice QA β Free-form decisions.
π High accuracy in multiple-choice tests does not necessarily translate to consistent open-ended responses (free-form inconsistency as measured in this paper: arxiv.org/abs/2410.13204).
6/ Impact of demographic information on decision-making
π Bias alert: All models performed differently across categories based on patient age, gender coding, and ethnicity. (Full plots in the paper)
5/ We put 15 LMs to the test. The results?
π LMs did great on more factual tasks (diagnosis, treatment).
π LMs struggled with complex decisions (triage, documentation).
π (Mental) health fine-tuned models (higher MedQA scores) dont outperform their off-the-shelf parent models.
4/ The questions in the triage and documentation categories are designed to be ambiguous to reflect the challenges and nuances of these tasks, for which we collect annotations and create a preference dataset to enable more nuanced analysis with soft labels.
26.02.2025 17:07 β π 0 π 0 π¬ 1 π 03/ Each question has five answer options for which we remove all non-decision-relevant demographic information of patients to allow for detailed studies of how patient demographic information (age, gender, ethnicity, nationality, β¦) impacts model performance.
26.02.2025 17:07 β π 0 π 0 π¬ 1 π 02/ Introducing MENTAT π§ (MENtal health Tasks AssessmenT): A first-of-its-kind dataset designed and annotated by mental health experts with no LM involvement. It covers real clinical tasks in five categories:
β
Diagnosis
β
Treatment
β
Monitoring
β
Triage
β
Documentation
1/ Current clinical AI evaluations rely on medical board-style exams that favor factual recall. Real-world decision-making is complex, subjective, and with ambiguity even to human expert decision-makersβspotlighting critical AI safety issues also in other domains. Also: ai.nejm.org/doi/full/10....
26.02.2025 17:07 β π 1 π 0 π¬ 2 π 0π¨ New paper!
Medical AI benchmarks over-simplify real-world clinical practice and build on medical exam-style questionsβespecially in mental healthcare. We introduce MENTAT, a clinician-annotated dataset tackling real-world ambiguities in psychiatric decision-making.
π§΅ Thread:
Now also on arxiv.org/abs/2502.14143 !
21.02.2025 20:03 β π 3 π 0 π¬ 0 π 0I'm very happy to have contributed to the report.
Read the full report or the executive summary here t.co/jsoa3y1bLm (also coming to arxiv)
We analyze key failure modes (conflict, collusion, and miscommunication), and describe seven risk factors that can lead to these failures (information asymmetries, network effects, selection pressures, destabilizing dynamics, commitment and trust, emergent agency, and multi-agent security).
20.02.2025 20:30 β π 0 π 0 π¬ 1 π 0Check out our new report on multi-agent security led by Lewis Hammond and the Cooperative AI Foundation! With the deployment of increasingly agentic AI systems across domains, this research area becomes more crucial.
20.02.2025 20:30 β π 5 π 1 π¬ 1 π 1Submitting a benchmark to
ICML? Check out our NeurIPS Spotlight paper BetterBench! We outline best practices for benchmark design, implementation & reporting to help shift community norms. Be part of the change! π
+ Add your benchmark to our database for visibility: betterbench.stanford.edu
It was fun to contribute to this new dataset evaluating at the frontier of human expert knowledge! Beyond accuracy, the results also demonstrate the necessity for novel uncertainty quantification methods for LMs attempting challenging tasks and decision-making.
Check out the paper at: lastexam.ai