Hadas Orgad's Avatar

Hadas Orgad

@hadasorgad.bsky.social

47 Followers  |  2 Following  |  11 Posts  |  Joined: 06.03.2025
Posts Following

Posts by Hadas Orgad (@hadasorgad.bsky.social)

Full paper >> actionable-interpretability-guide.github.io/paper.pdf
Blog >> actionable-interpretability-guide.github.io

23.02.2026 15:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Joint work w/ amazing collaborators @fbarez.bsky.social @talhaklay.bsky.social @wordscompute.bsky.social @mariusmosbach.bsky.social @anja.re @nsaphra.bsky.social @byron.bsky.social @sarah-nlp.bsky.social @profericwong.bsky.social @iftenney.bsky.social @megamor2.bsky.social

23.02.2026 15:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We’re not saying all interpretability work must be immediately actionableβ€” curiosity-driven research still matters. But actionability is a high bar: understanding that works outside the lab.

To make your next project more actionable, use our checklist >>

23.02.2026 15:38 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Actionable interpretability is worth aiming for. We identified five domains where answering *why* unlocks a fundamental advantage.

23.02.2026 15:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Interpretability isn't actionable (yet) for three reasons:
β†’ Papers aren't expected to demonstrate applications
β†’ Insights are shown in oversimplified settings without real baselines
β†’ Methods require domain expertise

23.02.2026 15:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Why haven't insights from interpretability transformed AI yet? Because we're not prioritizing actionable insights.

Full paper >> actionable-interpretability-guide.github.io/paper.pdf
Blog - The Hitchhiker's Guide 🧭 to
Actionable Interpretability >> actionable-interpretability-guide.github.io/

23.02.2026 15:38 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Our ICML 2025 workshop on Actionable Interpretability drew massive interest. But the same questions kept coming up: What does "actionable" mean? Is it achievable? How?
We're ready to answer.
🧡

23.02.2026 15:38 β€” πŸ‘ 21    πŸ” 8    πŸ’¬ 1    πŸ“Œ 1
Post image

Deadline extended! ⏳

The Actionable Interpretability Workshop at #ICML2025 has moved its submission deadline to May 19th. More time to submit your work πŸ”πŸ§ βœ¨ Don’t miss out!

03.05.2025 20:00 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Logo for MIB: A Mechanistic Interpretability Benchmark

Logo for MIB: A Mechanistic Interpretability Benchmark

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 π— π—œπ—•: a 𝗠echanistic π—œnterpretability 𝗕enchmark!

23.04.2025 18:15 β€” πŸ‘ 51    πŸ” 15    πŸ’¬ 1    πŸ“Œ 6
General Information ICML 2025 - Vancouver

β€’ Model Innovation – Designs and training inspired by interpretability.
β€’ Impact Measurement – Benchmarks for real-world effectiveness.
β€’ Critical Perspectives – Feasibility, limits, and future directions.

Website >>> actionable-interpretability.github.io

31.03.2025 17:06 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

β€’ Real-world Applications – Tackling bias, hallucinations, adversarial threats, and use in critical domains like healthcare, finance and cybersecurity.
β€’ Method Comparison – Interpretability vs. alternative methods such as fine-tuning, prompting, etc.

31.03.2025 17:05 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We aim to foster discussions on how interpretability research can inform concrete improvements in model design, safety, and robustness.

Topics of interest: ⬇️

31.03.2025 17:05 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0