NicolΓ² & Mingyang: Can we understand which circuits emerge in small models and reasoning-tuned systems, and how do they compare with default systems? Are there methods that generalize better across all tasks?
09.11.2025 07:23 β π 0 π 0 π¬ 0 π 0
Q: What's next for interpretability benchmarks? Michal: People sitting together and planning how to extend tests to multimodal, diverse contexts. @michaelwhanna.bsky.social: For circuit finding, integrating sparse features circuits could help us better understand our models.
09.11.2025 07:21 β π 0 π 0 π¬ 1 π 0
NicolΓ² & Mingyang: Starting to explore notebooks and public libraries can be very helpful in gaining early intuitions about what's promising.
09.11.2025 07:16 β π 0 π 0 π¬ 1 π 0
@michaelwhanna.bsky.social: Don't try to read everything. Find Qs you really care about, and go a level deeper to answer meaningful questions.
09.11.2025 07:15 β π 0 π 0 π¬ 1 π 0
Q: How would one go about approaching interpretability research these days? Michal: "When things don't work out of the box, it's a sign to double down and find out why. Negative results are important!"
09.11.2025 07:15 β π 1 π 0 π¬ 1 π 0
@danaarad.bsky.social: As deep learning research converges on similar architectures for different modalities, it will be interesting to determine which interpretability method will remain useful across various models and tasks.
09.11.2025 07:15 β π 1 π 0 π¬ 1 π 0
@michaelwhanna.bsky.social, NicolΓ² & Mingyang: Counterfactuals in minimal settings can be helpful, but they do not capture the whole story. Extending current methods to long contexts, and finding practical applications in safety-related areas are exciting challenges ahead.
09.11.2025 07:07 β π 1 π 0 π¬ 1 π 0
Michal: Mechanistic interpretability has heavily focused on toy tasks and text-only models. The next step is scaling to more complex tasks that involve real-world reasoning.
09.11.2025 07:07 β π 1 π 0 π¬ 1 π 0
Our panel moderated by @danaarad.bsky.social
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! π Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, NicolΓ² Brunello and Mingyang Wang!
09.11.2025 06:54 β π 9 π 1 π¬ 1 π 1
Next up: Kentaro Ozeki presenting "Normative Reasoning in Large Language Models: A Comparative Benchmark from Logical and Modal Perspectives" aclanthology.org/2025.blackbo...
09.11.2025 06:32 β π 1 π 0 π¬ 0 π 0
After a productive poster session, BlackboxNLP returns with the second keynote "Memorization: Myth or Mystery?" by @vernadankers.bsky.social!
09.11.2025 05:48 β π 7 π 0 π¬ 0 π 0
Nadav Shani is giving the first oral presentation of the day: Language Dominance in Multilingual Large Language Models. Find the paper here: aclanthology.org/2025.blackbo...
09.11.2025 02:19 β π 3 π 0 π¬ 0 π 0
Next up: Circuit-Tracer: A New Library for Finding Feature Circuits presented by @michaelwhanna.bsky.social! Paper: aclanthology.org/2025.blackbo...
09.11.2025 02:17 β π 3 π 0 π¬ 0 π 0
I'll be presenting this work at @blackboxnlp.bsky.social in Suzhou, happy to chat there or here if you are interested !
22.10.2025 08:16 β π 1 π 1 π¬ 1 π 0
Nov 9, @blackboxnlp.bsky.social , 11:00-12:00 @ Hall C β Interpreting Language Models Through Concept Descriptions: A Survey (Feldhus & Kopf) @lkopf.bsky.social
ποΈ aclanthology.org/2025.blackbo...
bsky.app/profile/nfel...
06.11.2025 07:00 β π 4 π 2 π¬ 1 π 1
Quanshi Zhang is giving the first keynote of the day: Can Neural Network Interpretability Be the Key to Breaking Through Scaling Law Limitations in Deep Learning?
09.11.2025 01:38 β π 0 π 0 π¬ 0 π 0
BlackboxNLP is up and running! Here's the topics covered by this year's edition at a glance. Excited to see so many interesting topics, and the growing interest in reasoning!
09.11.2025 01:38 β π 2 π 0 π¬ 0 π 1
π’ Call for Papers! π’
#BlackboxNLP 2025 invites the submission of archival and non-archival papers on interpreting and explaining NLP models.
π
Deadlines: Aug 15 (direct submissions), Sept 5 (ARR commitment)
π More details: blackboxnlp.github.io/2025/call/
12.08.2025 19:10 β π 9 π 1 π¬ 0 π 3
Writing your technical report for the MIB shared task?
Take a look at the task page for guidelines and tips!
06.08.2025 09:51 β π 2 π 0 π¬ 0 π 0
The report deadline was also extended to August 10th!
Note that this is a final extension. We look forward to reading your reports! βοΈ
06.08.2025 09:49 β π 2 π 1 π¬ 0 π 0
Just 5 days left to submit your method to the MIB Shared Task at #BlackboxNLP!
Have last-minute questions or need help finalizing your submission?
Join the Discord server: discord.gg/n5uwjQcxPR
03.08.2025 06:40 β π 1 π 1 π¬ 0 π 0
BlackboxNLP 2025
The Eight Workshop on Analyzing and Interpreting Neural Networks for NLP
Results + technical report deadline: August 8, 2025
Full task details: blackboxnlp.github.io/2025/task/
30.07.2025 05:57 β π 0 π 0 π¬ 0 π 0
With the new extended deadline, there's still plenty of time to submit your method to the MIB Shared Task!
We welcome submissions of existing methods, experimental POCs, or any approach addressing circuit discovery or causal variable localization π‘
30.07.2025 05:57 β π 2 π 1 π¬ 1 π 0
Results deadline extended by one week!
Following requests from participants, weβre extending the MIB Shared Task submission deadline by one week.
ποΈ New deadline: August 8, 2025
Submit your method via the MIB leaderboard!
29.07.2025 09:35 β π 3 π 1 π¬ 0 π 2
π Technical report guidelines are out!
If you're submitting to the MIB Shared Task at #BlackboxNLP, feel free to take a look to help you prepare your report: blackboxnlp.github.io/2025/task/
28.07.2025 12:34 β π 3 π 1 π¬ 0 π 1
Just 10 days to go until the results submission deadline for the MIB Shared Task at #BlackboxNLP!
If you're working on:
π§ Circuit discovery
π Feature attribution
π§ͺ Causal variable localization
nowβs the time to polish and submit!
Join us on Discord: discord.gg/n5uwjQcxPR
23.07.2025 07:42 β π 3 π 1 π¬ 0 π 1
Are you attending ICML? π
I'm sadly not, but if you are, you should check out the MIB πΆοΈposter at 11AM: icml.cc/virtual/2025...
The benchmark is used as the shared task at this year's
@blackboxnlp.bsky.social (blackboxnlp.github.io/2025/task/) - there's still time to participate π
17.07.2025 15:56 β π 4 π 1 π¬ 0 π 0
β³ Three weeks left! Submit your work to the MIB Shared Task at #BlackboxNLP, co-located with @emnlpmeeting.bsky.social
Whether you're working on circuit discovery or causal variable localization, this is your chance to benchmark your method in a rigorous setup!
13.07.2025 05:56 β π 4 π 2 π¬ 0 π 2
Have you started working on your submission for the MIB shared task yet? Tell us what youβre exploring!
New featurization methods?
Circuit pruning?
Better feature attribution?
We'd love to hear about it π
09.07.2025 07:15 β π 2 π 1 π¬ 0 π 1
BlackboxNLP 2025
The Eight Workshop on Analyzing and Interpreting Neural Networks for NLP
ποΈ Deadline: August 1
π Full task details: blackboxnlp.github.io/2025/task/
π¬ Join the discussion: discord.gg/n5uwjQcxPR
08.07.2025 09:35 β π 2 π 0 π¬ 0 π 0
AI PhDing at Mila/McGill (prev FAIR intern). Happily residing in Montreal π₯―βοΈ
Academic: language grounding, vision+language, interp, rigorous & creative evals, cogsci
Other: many sports, urban explorations, puzzles/quizzes
bennokrojer.com
PhD @ Pompeu Fabra in AI Fairness
dineshayyappan.com
Former teacher
Georgia Tech, Boston Teacher Residency, Carnegie Mellon
US, India, Singapore, Spain
OpenAI and MIT faculty (on leave)
Chief Scientist Human-Center Trustworthy AI @ IBM Research. Interested in Human+AI Interaction & AI-Assisted Productivity. Opinions are my own! https://wernergeyer.com
I study language using tools from cognitive science and neuroscience. I also like snuggles.
Research scientist at Anthropic.
PhD in machine learning from the University of Toronto and Vector Institute.
Prev: NVIDIA, Google
NeuroAI, vision, open science. NeuroAI researcher at Amaranth Foundation. Previously engineer @ Google, Meta, Mila. Updates from http://neuroai.science
PhD student @CMU LTI - working on model #interpretability, student researcher @google; prev predoc @ai2; intern @MSFT
nishantsubramani.github.io
Machine learning PhD student @ Blei Lab in Columbia University
Working in mechanistic interpretability, nlp, causal inference, and probabilistic modeling!
Previously at Meta for ~3 years on the Bayesian Modeling & Generative AI teams.
π www.sweta.dev
Gemini Post-Training β«οΈ Research Scientist at Google DeepMind β«οΈ PhD from ETH Zurich
PhD (in progress) @ Northeastern! NLP π€ LLMs
she/her
AI accountability, audits & eval. Keen on participation & practical outcomes. CS PhDing @UCBerkeley.
Interested in cognition and artificial intelligence. Research Scientist at Google DeepMind. Previously cognitive science at Stanford. Posts are mine.
lampinen.github.io
Postdoc in AI at the Allen Institute for AI & the University of Washington.
π https://valentinapy.github.io
Assoc. Prof in CS @ Northeastern, NLP/ML & health & etc. He/him.