ICML Expo Talk Panel Situating principles in context for synthetic dataICML 2025
I will also be on IBM's Expo Talk Panel on Monday, Jul 14, to discuss how SPRI can be incorporated in the industry for generating high-quality synthetic data. You can find more details of the talk here: icml.cc/virtual/2025...
08.07.2025 15:13 β π 0 π 0 π¬ 0 π 0
ICML Poster SPRI: Aligning Large Language Models with Context-Situated PrinciplesICML 2025
πLink to the paper: icml.cc/virtual/2025...
π¨π»βπ»Code and data: github.com/honglizhan/S...
Shout out to an amazing team @jessyjli.bsky.social, @m-yurochkin.bsky.social, Muneeza Azmat & Raya Horesh! Also super grateful to the reviewers for their invaluable feedback!
#ICML2025 #LLMAlignment
08.07.2025 15:13 β π 0 π 0 π¬ 1 π 0
1οΈβ£SPRI generates principles as effective as psychologists to improve usersβ well-being
2οΈβ£SPRI enables tailored rubrics for LLM-judges, matching human-crafted rubrics (e.g., BiGGen-Bench)
3οΈβ£SPRI-generated synthetic data boosts Llama/Mistral/Gemma (7~9B) on TruthfulQA, with no loss on other benchmarks
08.07.2025 15:11 β π 0 π 0 π¬ 1 π 0
π―Motivation: Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply. Can we guide responses with context-situated principles instead?
π‘SPRI tackles this and proves to rival human oracle guidance in the three real-world use cases we tested on π
08.07.2025 15:06 β π 0 π 0 π¬ 1 π 0
I'll be at #ICML to present SPRI next week! Come by our poster on Tuesday, July 15, 4:30pm, and letβs catch up on LLM alignment! π
πTL;DR: We introduce Situated-PRInciples (SPRI), a framework that automatically generates input-specific principles to align responses β with minimal human effort.
π§΅
08.07.2025 15:05 β π 3 π 1 π¬ 1 π 1
π―Motivation: Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply. Can we guide responses with context-situated principles instead?
π‘SPRI tackles this and proves to rival human oracle guidance in the three real-world use cases we tested on π
08.07.2025 15:03 β π 1 π 0 π¬ 0 π 0
Iβm excited to share that our paper has been accepted at #ICML2025! ππ₯³π
This work was done during my internship at IBM Research, and it wouldnβt have been possible without a top-notch team and my amazing advisor π
02.05.2025 21:27 β π 4 π 1 π¬ 0 π 0
To appear #ICML2025!! π
02.05.2025 12:34 β π 4 π 1 π¬ 0 π 0
I definitely agree :) I think SPRI can help generate SFT data for their constitutional classifiers that extrapolate *beyond* the "chemical weapons" context that they show in Sec 5 and Appendix B.
Thanks for sharing this!
07.02.2025 05:20 β π 1 π 0 π¬ 0 π 0
The principles that LLMs align with should be specific to the task at hand! Check out @hongli-zhan.bsky.socialβs latest work π
06.02.2025 22:52 β π 4 π 1 π¬ 0 π 0
[5/5] Code and model generations: github.com/honglizhan/S...
This project was carried out during my internship at IBM Research, and Iβd like to highlight the support and mentorship from my amazing hosts Muneeza Azmat, Raya Horesh, @m-yurochkin.bsky.social and advisor @jessyjli.bsky.social!
06.02.2025 22:48 β π 0 π 0 π¬ 0 π 0
[4/5] In addition, when applying SPRI to generate SFT data for alignment, we observe substantial improvement on TruthfulQA.
06.02.2025 22:44 β π 0 π 0 π¬ 1 π 0
[3/5] We tested SPRI on 3 tasks: generating 1) cognitive reappraisals, 2) instance-specific rubrics for LLM-as-a-judge, and 3) SFT data for alignment.
SPRI turns out to work great for tasks that require complex principles, showcasing on-par performance as expert-guided methods.
06.02.2025 22:44 β π 0 π 0 π¬ 1 π 0
[2/5] Short for Situated-PRInciples, SPRI involves 2 stages: 1) synthesizing context-situated principles, and 2) crafting principle-guided responses.
In each stage, a base model and a critic model are used to create principles and responses from scratch through critique-refine.
06.02.2025 22:44 β π 0 π 0 π¬ 1 π 0
Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply.
Can we guide responses with context-situated principles instead?
Introducing SPRI, a system that produces principles tailored to each query, with minimal to no human effort.
arxiv.org/pdf/2502.03397
06.02.2025 22:43 β π 4 π 2 π¬ 2 π 3
Linguistics PhD student at UT Austin
O power of kings, lend me your compute
Research manager. Statistical Large Language Modeling, MIT-IBM AI Lab, IBM Research. Stats PhD @UMich.
cs/ling undergrad @univofmaryland.bsky.social | researcher @clipumd @uta ACL2
atreydesai.github.io
Interested in ML, AI, and NLP. Particularly interested in tokenization. Live in the Boston area and work in R&D at Kensho Technologies.
LLMs, AI, Psychology and Speaking - Seeking to infuse technology with empathy. That's why I founded www.neuroflash.com. An AI platform driving brand-aligned marketing, and helping people connect and understand each other's perspectives.
Graduate Student (@SharcLab) + Research Faculty at @GeorgiaTech π
Working on digital hardware design + AI
https://stefanabikaram.com/
Computing Science prof in multimodal embodied AI, emotion, interaction at SFU in Vancouver π¨π¦π΅π Director of the Rosie Lab www.rosielab.ca Robotics nerd. Previously at SoftBank Robotics π€ FR/JP
Feminist SocSci-Humanities scholar-practitioner in tech & AI. Associate Director & Senior Researcher: www.lcfi.ac.uk
VLR-MAA-DEL-BOM-BLR-BER-CBG
Outside cat; but more of a dog person really.
bodyofwork.in
Digital Geometer, Associate Professor of Computer Science & Robotics at Carnegie Mellon University. There are four lights.
https://www.cs.cmu.edu/~kmcrane/
I like tokens! Lead for OLMo data at @ai2.bsky.social (Dolma π) w @kylelo.bsky.social. Open source is fun π€βοΈππ³οΈβπ Opinions are sampled from my own stochastic parrot
more at https://soldaini.net
Legal Disclaimer:
Personal Account
All posting done on personal time, using personal device, and only reflect my personal opinions.
PhD student at NYU, working on NLP.
https://timchen0618.github.io
PhD student @ Penn
alonj.github.io
A LLN - large language Nathan - (RL, RLHF, society, robotics), athlete, yogi, chef
Writes http://interconnects.ai
At Ai2 via HuggingFace, Berkeley, and normal places
ML Science Lead @Amazon; prev @UT Austin. Team Lead for India at the International AI Olympiad 2025.
CS PhD candidate @UCLA | Prev. Research Intern @MSFTResearch, Applied Scientist Intern @AWS | LLM post-training, multi-modal learning
https://yihedeng9.github.io
CS PhD @UMassAmherst | Working on Robustness, NLP & Healthcare | Prev. @mckinsey @ShivNadarUniv | Side Quest: Dj & Deadlift | Opinions: Personal
Principal Research Scientist at IBM Research AI in New York. Speech, Formal/Natural Language Processing. Currently LLM post-training, structured SDG and RL. Opinions my own and non stationary.
ramon.astudillo.com