Hongli Zhan ✈️ ICML @hongli-zhan

ICML Expo Talk Panel Situating principles in context for synthetic dataICML 2025

I will also be on IBM's Expo Talk Panel on Monday, Jul 14, to discuss how SPRI can be incorporated in the industry for generating high-quality synthetic data. You can find more details of the talk here: icml.cc/virtual/2025...

08.07.2025 15:13 — 👍 0 🔁 0 💬 0 📌 0

ICML Poster SPRI: Aligning Large Language Models with Context-Situated PrinciplesICML 2025

📜Link to the paper: icml.cc/virtual/2025...
👨🏻‍💻Code and data: github.com/honglizhan/S...

Shout out to an amazing team @jessyjli.bsky.social, @m-yurochkin.bsky.social, Muneeza Azmat & Raya Horesh! Also super grateful to the reviewers for their invaluable feedback!

#ICML2025 #LLMAlignment

08.07.2025 15:13 — 👍 0 🔁 0 💬 1 📌 0

1️⃣SPRI generates principles as effective as psychologists to improve users’ well-being

2️⃣SPRI enables tailored rubrics for LLM-judges, matching human-crafted rubrics (e.g., BiGGen-Bench)

3️⃣SPRI-generated synthetic data boosts Llama/Mistral/Gemma (7~9B) on TruthfulQA, with no loss on other benchmarks

08.07.2025 15:11 — 👍 0 🔁 0 💬 1 📌 0

🎯Motivation: Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply. Can we guide responses with context-situated principles instead?

💡SPRI tackles this and proves to rival human oracle guidance in the three real-world use cases we tested on 👇

08.07.2025 15:06 — 👍 0 🔁 0 💬 1 📌 0

I'll be at #ICML to present SPRI next week! Come by our poster on Tuesday, July 15, 4:30pm, and let’s catch up on LLM alignment! 😃

🚀TL;DR: We introduce Situated-PRInciples (SPRI), a framework that automatically generates input-specific principles to align responses — with minimal human effort.

🧵

08.07.2025 15:05 — 👍 3 🔁 1 💬 1 📌 1

08.07.2025 15:03 — 👍 1 🔁 0 💬 0 📌 0

I’m excited to share that our paper has been accepted at #ICML2025! 🎉🥳🎊

This work was done during my internship at IBM Research, and it wouldn’t have been possible without a top-notch team and my amazing advisor 👏

02.05.2025 21:27 — 👍 4 🔁 1 💬 0 📌 0

To appear #ICML2025!! 🎉

02.05.2025 12:34 — 👍 4 🔁 1 💬 0 📌 0

I definitely agree :) I think SPRI can help generate SFT data for their constitutional classifiers that extrapolate *beyond* the "chemical weapons" context that they show in Sec 5 and Appendix B.

Thanks for sharing this!

07.02.2025 05:20 — 👍 1 🔁 0 💬 0 📌 0

The principles that LLMs align with should be specific to the task at hand! Check out @hongli-zhan.bsky.social’s latest work 👇

06.02.2025 22:52 — 👍 4 🔁 1 💬 0 📌 0

[5/5] Code and model generations: github.com/honglizhan/S...

This project was carried out during my internship at IBM Research, and I’d like to highlight the support and mentorship from my amazing hosts Muneeza Azmat, Raya Horesh, @m-yurochkin.bsky.social and advisor @jessyjli.bsky.social!

06.02.2025 22:48 — 👍 0 🔁 0 💬 0 📌 0

[4/5] In addition, when applying SPRI to generate SFT data for alignment, we observe substantial improvement on TruthfulQA.

06.02.2025 22:44 — 👍 0 🔁 0 💬 1 📌 0

[3/5] We tested SPRI on 3 tasks: generating 1) cognitive reappraisals, 2) instance-specific rubrics for LLM-as-a-judge, and 3) SFT data for alignment.

SPRI turns out to work great for tasks that require complex principles, showcasing on-par performance as expert-guided methods.

06.02.2025 22:44 — 👍 0 🔁 0 💬 1 📌 0

[2/5] Short for Situated-PRInciples, SPRI involves 2 stages: 1) synthesizing context-situated principles, and 2) crafting principle-guided responses.

In each stage, a base model and a critic model are used to create principles and responses from scratch through critique-refine.

06.02.2025 22:44 — 👍 0 🔁 0 💬 1 📌 0

Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply.

Can we guide responses with context-situated principles instead?

Introducing SPRI, a system that produces principles tailored to each query, with minimal to no human effort.

arxiv.org/pdf/2502.03397

06.02.2025 22:43 — 👍 4 🔁 2 💬 2 📌 3

Hongli Zhan ✈️ ICML

Latest posts by hongli-zhan.bsky.social on Bluesky

@hongli-zhan is following 20 prominent accounts