Making model internals accessible to domain experts in low-code interfaces will unlock the next step in making interpretability useful across a variety of domains. Very excited about the NDIF Workbench! ๐ก
10.10.2025 17:53 โ ๐ 9 ๐ 1 ๐ฌ 0 ๐ 0
๐ More advanced interpretability tools coming soon. What techniques would you like to see? Reach out or drop suggestions in the form.
10.10.2025 17:36 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
YouTube video by NDIF Team
Workbench Logit Lens Demo
Study any NDIF-hosted model (including Llama 405B) directly in your browser. Our first tool, Logit Lens, lets you peer inside LLM computations layer-by-layer. Watch the full demo on YouTube (www.youtube.com/watch?v=BK-q...) or try it yourself: workbench.ndif.us
10.10.2025 17:36 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
Ever wished you could explore what's happening inside a 405B parameter model without writing any code? Workbench, our AI interpretability interface, is now live for public beta at workbench.ndif.us!
10.10.2025 17:35 โ ๐ 7 ๐ 3 ๐ฌ 1 ๐ 1
๐ More advanced interpretability tools coming soon. What techniques would you like to see? Reach out or drop suggestions in the form.
10.10.2025 17:34 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
NDIF Workbench Feedback
Thank you for taking the time to submit your feedback! Every little bit helps.
This is a public beta, so we expect bugs and actively want your feedback: forms.gle/WsxmZikeLNw...
10.10.2025 17:34 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Read the paper or play around with some demos on the project website!
ArXiv: arxiv.org/abs/2410.22366
Project Website: sdxl-unbox.epfl.ch/
03.10.2025 18:45 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
In this talk, Chris Wendler presents his recent work on using sparse autoencoders for diffusion models. In this work, they train SAEs on SDXL Turbo, finding ...
Interpreting SDXL Turbo Using Sparse Autoencoders with Chris Wendler
New YouTube video posted! @wendlerc.bsky.social presents his work using SAEs for diffusion text-to-image models. The authors find interpretable SAE features and demonstrate how these features can alter generated images.
Watch here: youtu.be/43NnaqGjArA
03.10.2025 18:45 โ ๐ 4 ๐ 1 ๐ฌ 1 ๐ 1
NSF National Deep Inference Fabric
NDIF is a research computing project that enables researchers and students to crack open the mysteries inside large-scale AI systems.
Reminder that today is the deadline to apply for our hot-swapping program! Be the first to test out many new models remotely on NDIF and submit your application today!
More details: ndif.us/hotswap.html
Application link: forms.gle/KHVkYxybmK12...
01.10.2025 18:10 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Want increased remote model availability on NDIF? Interested in studying model checkpoints?
Sign up for the NDIF hot-swapping pilot by October 1st: forms.gle/Cf4WF3xiNzud...
26.09.2025 18:57 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 1
Participants will:
1. Be in the first cohort of users to access models beyond our whitelist
2. Directly control which models are hosted on the NDIF backend
3. Receive guided support on their project from the NDIF team
4. Give feedback, guiding future user experience
04.09.2025 00:41 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
This fall, we are running a program to test our model hot-swapping on real research projects. Projects should require internal access to multiple models, which could include model checkpoints, different model sizes, unique model architectures, or other creative approaches.
04.09.2025 00:41 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0
Do you wish you could run experiments on any model remotely from your laptop? In a future release, NDIF users will be able to dynamically deploy any model from HuggingFace on NDIF for remote experimentation. But before this, we need your help!
04.09.2025 00:41 โ ๐ 4 ๐ 1 ๐ฌ 1 ๐ 0
09:30 AM - 09:40 AM: Opening Remarks (David Bau) 09:40 AM - 10:00 AM: Keynote 1: Lee Sharkey: "Mech Interp: Where should we go from here?" 10:00 AM - 10:10 A...
New England Mechanistic Interpretability Workshop
We are presenting our NNsight / NDIF demos at NEMI now!
Tune in:
youtube.com/live/q8Su4C...
22.08.2025 19:28 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
The NEMI conference is live!
Watch our livestream here: youtube.com/live/q8Su4C...
22.08.2025 13:40 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
About:The New England Mechanistic Interpretability (NEMI) workshop aims to bring together academic and industry researchers from the New England and surround...
New England Mechanistic Interpretability Workshop
This Friday NEMI 2025 is at Northeastern in Boston, 8 talks, 24 roundtables, 90 posters; 200+ attendees. Thanks to
goodfire.ai/ for sponsoring! nemiconf.github.io/summer25/
If you can't make it in person, the livestream will be here:
www.youtube.com/live/4BJBis...
18.08.2025 18:06 โ ๐ 16 ๐ 7 ๐ฌ 1 ๐ 3
We will use this channel to post lectures on AI interpretability research, educational information, NDIF and NNsight updates, and more. If you're interested in collaborating on a video or would like to suggest a topic, please reach out!
07.08.2025 17:36 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
David Bau is an Assistant Professor of Computer Science at Northeastern University's Khoury College. His lab studies the structure and interpretation of deep...
ROME: Locating and Editing Factual Associations in GPT with David Bau
Our YouTube channel is live! Our first video features @davidbau.bsky.socialโฌ presenting the ROME project:
www.youtube.com/watch?v=eKd...
07.08.2025 17:35 โ ๐ 7 ๐ 2 ๐ฌ 1 ๐ 0
Want to try it for yourself? Check out our new mini-paper tutorial in NNsight to see how intervening on concept induction heads can reveal language-invariant concepts and cause a model to paraphrase text!
๐ nnsight.net/notebooks/m...
05.08.2025 16:31 โ ๐ 5 ๐ 0 ๐ฌ 0 ๐ 0
Using causal mediation analysis on words that span multiple tokens, @sfeucht.bsky.social et al. found concept induction heads that are separate from token induction heads.
๐ dualroute.baulab.info/
05.08.2025 16:31 โ ๐ 4 ๐ 0 ๐ฌ 1 ๐ 0
Induction heads are attention heads that help complete patterns by copying tokens (transformer-circuits.pub/2021/framew...), but can they also copy over concepts?
05.08.2025 16:31 โ ๐ 6 ๐ 0 ๐ฌ 1 ๐ 0
Great to present whatโs coming next for NDIF at the @actinterp.bsky.social workshop at #ICML2025!
If you missed us, letโs chat after the conference. Reach out here: forms.gle/LtTyYnkaxDyg...
19.07.2025 19:54 โ ๐ 8 ๐ 0 ๐ฌ 0 ๐ 0
Weโre collaborating with researchers in the field to provide detailed, educational, and replicable notebook tutorials of recent papers. Check out nnsight.net/applied_tut... for a current list of mini paper tutorials. We plan to release a new tutorial every week.
16.07.2025 19:41 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
Weโre excited to announce a new series of applied "mini paper" tutorials! The goal of this series is to help researchers get hands-on experience with findings, methods, and results from recent papers in interpretability using NNsight and NDIF.
16.07.2025 19:41 โ ๐ 4 ๐ 1 ๐ฌ 1 ๐ 0
Google Colab
Excited to share our first paper replication tutorial, walking you through the main figures from "Do Language Models Use Their Depth Efficiently?" by @robertcsordas.bsky.social
๐ Demo on Colab: colab.research.google.com/github/ndif-...
๐ Read the full manuscript: arxiv.org/abs/2505.13898
04.07.2025 00:27 โ ๐ 5 ๐ 1 ๐ฌ 0 ๐ 0
Human/AI interaction. ML interpretability. Visualization as design, science, art. Professor at Harvard, and part-time at Google DeepMind.
The best thinking on existential threats since 1945. Nuclear risk, climate change, and disruptive technologies. We set the #DoomsdayClock. thebulletin.org
PhD student doing LLM interpretability with @davidbau.bsky.social and @byron.bsky.social. (they/them) https://sfeucht.github.io
https://mega002.github.io
Gemini Post-Training โซ๏ธ Research Scientist at Google DeepMind โซ๏ธ PhD from ETH Zurich
AI Safety Research // Software Engineering
PhD Student at @gronlp.bsky.social ๐ฎ, core dev @inseq.org. Interpretability โฉ HCI โฉ #NLProc.
gsarti.com
Waiting on a robot body. All opinions are universal and held by both employers and family.
Literally a professor. Recruiting students to start my lab.
ML/NLP/they/she.
Machine learning haruspex
NLP PhD student at Imperial College London and Apple AI/ML Scholar.
Machine learning PhD student @ Blei Lab in Columbia University
Working in mechanistic interpretability, nlp, causal inference, and probabilistic modeling!
Previously at Meta for ~3 years on the Bayesian Modeling & Generative AI teams.
๐ www.sweta.dev
Machine Learning PhD Student
@ Blei Lab & Columbia University.
Working on probabilistic ML | uncertainty quantification | LLM interpretability.
Excited about everything ML, AI and engineering!
PhD student at Vector Institute / University of Toronto. Building tools to study neural nets and find out what they know. He/him.
www.danieldjohnson.com
Mechanistic interpretability
Creator of https://github.com/amakelov/mandala
prev. Harvard/MIT
machine learning, theoretical computer science, competition math.
Post-doc @ Harvard. PhD UMich. Spent time at FAIR and MSR. ML/NLP/Interpretability
Computer Science PhD student | AI interpretability | Vision + Language | Cogntive Science.
https://martinagvilas.github.io/
ml/nlp phding @ usc, currently visiting harvard, scientisting @ startup;
interpretability & training & reasoning & ai for physics
ํamerican, she, (slightly outdated) iglee.me
Assistant Professor, University of Copenhagen; interpretability, xAI, factuality, accountability, xAI diagnostics https://apepa.github.io/