Hadas Orgad hadasorgad - Bluesky Statics

Full paper >> actionable-interpretability-guide.github.io/paper.pdf
Blog >> actionable-interpretability-guide.github.io

23.02.2026 15:43 — 👍 0 🔁 0 💬 0 📌 0

Joint work w/ amazing collaborators @fbarez.bsky.social @talhaklay.bsky.social @wordscompute.bsky.social @mariusmosbach.bsky.social @anja.re @nsaphra.bsky.social @byron.bsky.social @sarah-nlp.bsky.social @profericwong.bsky.social @iftenney.bsky.social @megamor2.bsky.social

23.02.2026 15:43 — 👍 0 🔁 0 💬 1 📌 0

We’re not saying all interpretability work must be immediately actionable— curiosity-driven research still matters. But actionability is a high bar: understanding that works outside the lab.

To make your next project more actionable, use our checklist >>

23.02.2026 15:38 — 👍 2 🔁 0 💬 1 📌 0

Actionable interpretability is worth aiming for. We identified five domains where answering *why* unlocks a fundamental advantage.

23.02.2026 15:38 — 👍 0 🔁 0 💬 1 📌 0

Interpretability isn't actionable (yet) for three reasons:
→ Papers aren't expected to demonstrate applications
→ Insights are shown in oversimplified settings without real baselines
→ Methods require domain expertise

23.02.2026 15:38 — 👍 0 🔁 0 💬 1 📌 0

Why haven't insights from interpretability transformed AI yet? Because we're not prioritizing actionable insights.

Full paper >> actionable-interpretability-guide.github.io/paper.pdf
Blog - The Hitchhiker's Guide 🧭 to
Actionable Interpretability >> actionable-interpretability-guide.github.io/

23.02.2026 15:38 — 👍 3 🔁 0 💬 1 📌 0

Our ICML 2025 workshop on Actionable Interpretability drew massive interest. But the same questions kept coming up: What does "actionable" mean? Is it achievable? How?
We're ready to answer.
🧵

23.02.2026 15:38 — 👍 21 🔁 8 💬 1 📌 1

Deadline extended! ⏳

The Actionable Interpretability Workshop at #ICML2025 has moved its submission deadline to May 19th. More time to submit your work 🔍🧠✨ Don’t miss out!

03.05.2025 20:00 — 👍 4 🔁 3 💬 0 📌 0

Logo for MIB: A Mechanistic Interpretability Benchmark

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 𝗠𝗜𝗕: a 𝗠echanistic 𝗜nterpretability 𝗕enchmark!

23.04.2025 18:15 — 👍 51 🔁 15 💬 1 📌 6

General Information ICML 2025 - Vancouver

• Model Innovation – Designs and training inspired by interpretability.
• Impact Measurement – Benchmarks for real-world effectiveness.
• Critical Perspectives – Feasibility, limits, and future directions.

Website >>> actionable-interpretability.github.io

31.03.2025 17:06 — 👍 3 🔁 0 💬 0 📌 0

• Real-world Applications – Tackling bias, hallucinations, adversarial threats, and use in critical domains like healthcare, finance and cybersecurity.
• Method Comparison – Interpretability vs. alternative methods such as fine-tuning, prompting, etc.

31.03.2025 17:05 — 👍 2 🔁 0 💬 1 📌 0

We aim to foster discussions on how interpretability research can inform concrete improvements in model design, safety, and robustness.

Topics of interest: ⬇️

31.03.2025 17:05 — 👍 3 🔁 0 💬 1 📌 0

Posts by Hadas Orgad (@hadasorgad.bsky.social)