We've added a quick new section to this paper, which was just accepted to @COLM_conf! By summing weights of concept induction heads, we created a "concept lens" that lets you read out semantic information in a model's hidden states. π
22.07.2025 12:39 β π 5 π 1 π¬ 1 π 0
Im excited for NEMI again this year! Iβve enjoyed local research meetups and getting to know others near me working on interesting problems.
30.06.2025 23:00 β π 1 π 0 π¬ 0 π 0
NEMI 2024 (Last Year)
π¨ Registration is live! π¨
The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University!
A chance for the mech interp community to nerd out on how models really work π§ π€
π Info: nemiconf.github.io/summer25/
π Register: forms.gle/v4kJCweE3UUH...
30.06.2025 22:55 β π 10 π 8 π¬ 0 π 2
How do language models track mental states of each character in a story, often referred to as Theory of Mind?
We reverse-engineered how LLaMA-3-70B-Instruct handles a belief-tracking task and found something surprising: it uses mechanisms strikingly similar to pointer variables in C programming!
24.06.2025 17:13 β π 54 π 19 π¬ 2 π 1
Can we uncover the list of topics a language model is censored on?
Refused topics vary strongly among models. Claude-3.5 vs DeepSeek-R1 refusal patterns:
13.06.2025 15:58 β π 8 π 4 π¬ 1 π 0
I'm not familiar with the reviewing load for ARR, but for COLM this I was only assigned 2 papers as a reviewer which is great. I had more time to try and understand each submission and it was much more manageable than getting assigned 6+ papers like ICML and NeurIPS do.
29.05.2025 00:14 β π 5 π 0 π¬ 0 π 0
I'll present a poster for this work at NENLP tomorrow! Come find me at poster #80...
10.04.2025 21:19 β π 6 π 1 π¬ 0 π 0
Sheridan asks whether the Dual Route Model of Reading that psychologists have observed in humans also appears in LLMs.
In her brilliantly simple study of induction heads, she finds that it does! Induction has a Dual Route that separates concepts from literal token processing.
Worth reading βοΈ
07.04.2025 15:23 β π 7 π 2 π¬ 0 π 0
[π] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.
07.04.2025 13:54 β π 73 π 18 π¬ 1 π 5
I reviewed for ICML this year and it felt to me like the paper quality was lower than previous reviewing assignments Iβve had. In my batch I had 3/7 that Iβd consider low quality submissions. The review process was also more involved (but hopefully it allows for a better feedback mechanism)
25.03.2025 22:06 β π 1 π 0 π¬ 0 π 0
What will be the linchpin for AI dominance?
Read our NSF/OSTP recommendations written with Goodfire's Tom McGrath tommcgrath.github.io, Transluce's Sarah Schwettmann cogconfluence.com, MIT's Dylan Hadfield-Menell @dhadfieldmenell.bsky.social
TLDR; Dominance comes from **interpretability** π§΅ βοΈ
16.03.2025 13:57 β π 22 π 8 π¬ 1 π 1
Oxford Word of the Year 2024 - Oxford University Press
The Oxford Word of the Year 2024 is 'brain rot'. Discover more about the winner, our shortlist, and 20 years of words that reflect the world.
I'm searching for some comp/ling experts to provide a precise definition of βslopβ as it refers to text (see: corp.oup.com/word-of-the-...)
I put together a google form that should take no longer than 10 minutes to complete: forms.gle/oWxsCScW3dJU...
If you can help, I'd appreciate your input! π
10.03.2025 20:00 β π 10 π 8 π¬ 0 π 0
Induction heads are commonly associated with in-context learning, but are they the primary driver of ICL at scale?
We find that recently discovered "function vector" heads, which encode the ICL task, are the actual primary mechanisms behind few-shot ICL!
arxiv.org/abs/2502.14010
π§΅π
28.02.2025 16:16 β π 23 π 7 π¬ 1 π 0
LLMs are known to perpetuate social biases in clinical tasks. Can we locate and intervene upon LLM activations that encode patient demographics like gender and race? π§΅
Work w/ @arnabsensharma.bsky.social, @silvioamir.bsky.social, @davidbau.bsky.social, @byron.bsky.social
arxiv.org/abs/2502.13319
22.02.2025 04:17 β π 11 π 5 π¬ 2 π 1
Please help amplify ARBOR, a fantastic new research opportunity! If youβd like to start contributing, NDIF is now hosting DeepSeek R1 8B and 70B, open for all researchers to experiment on via our API.
Sign up for API access here: login.ndif.us
20.02.2025 22:35 β π 4 π 3 π¬ 0 π 0
I'm excited about this new open research initiative! It kind of feels like this is how science is supposed to be done - collaborating and sharing ideas in the open. If you've thought about studying the mechanisms behind R1 & other reasoning models check it out!
20.02.2025 23:15 β π 0 π 0 π¬ 0 π 0
DeepSeek R1 shows how important it is to be studying the internals of reasoning models. Try our code: Here @canrager.bsky.social shows a method for auditing AI bias by probing the internal monologue.
dsthoughts.baulab.info
I'd be interested in your thoughts.
31.01.2025 14:30 β π 29 π 9 π¬ 1 π 1
What was the most important machine learning paper in 2024?
My Famous Deep Learning Papers list (that I use in teaching) does not include any new ideas from the last year.
papers.baulab.info
Which single new paper would you add?
31.12.2024 15:09 β π 56 π 12 π¬ 10 π 0
Yes, I remember learning them with "arc" at the beginning (e.g. arcsin, arccos).
12.12.2024 22:08 β π 1 π 0 π¬ 0 π 0
More big news! Applications are open for the NDIF Summer Engineering Fellowshipβan opportunity to work on cutting-edge AI research infrastructure this summer in Boston! π
10.12.2024 21:59 β π 9 π 6 π¬ 1 π 2
The Phase 2 NDIF Pilot is open for a short window.
Apply now to get research capacity on Llama 405b.
Deadline is December 31.
It is not easy to crack open 405b for research, but NDIF solves the key engineering problems for you. Phase 1 powered several very interesting ICLR submissions...
10.12.2024 07:09 β π 15 π 5 π¬ 0 π 0
Do you have a great experiment that you want to run on Llama 405b but not enough GPUs?
π¨ #NDIF is opening up more spots in our 405b pilot program! Apply now for a chance to conduct your own groundbreaking experiments on the 405b model. Details: π§΅β¬οΈ
09.12.2024 20:04 β π 18 π 4 π¬ 1 π 1
PhD Apply - Khoury College of Computer Sciences
PhD Applicants: remember that the Northeastern Computer Science PhD application deadline is Dec 15.
It's a terrific time to do a PhD, with so many interesting things happening in AI.
Apply here:
www.khoury.northeastern.edu/apply/phd-ap...
07.12.2024 10:31 β π 33 π 5 π¬ 0 π 0
New Preprint π
Can diffusion models draw artistic inspiration from nature? π€
@huiren and @materzynska trained a diffusion model solely on natural images, excluding artwork from pre-training data βπ¨
Surprisingly, it can mimic art styles!
Curious how it works?ππ§΅w/ @davidbau and Antonio Torralba
04.12.2024 00:41 β π 3 π 1 π¬ 1 π 0
GitHub - rhfeiyang/art-free-diffusion: Official implementation of "Art-Free Generative Models: Art Creation Without Graphic Art Knowledge"
Official implementation of "Art-Free Generative Models: Art Creation Without Graphic Art Knowledge" - rhfeiyang/art-free-diffusion
Do you need to copy art to make art?
Hui Ren's and Joanna Materzynska's Art-Free Diffusion tests this question and lets you make "imitation-free" AI art
Github: github.com/rhfeiyang/ar...
Arxiv: arxiv.org/abs/2412.00176
Website: joaanna.github.io/art-free-dif...
X: x.com/materzynska/...
04.12.2024 22:04 β π 17 π 2 π¬ 0 π 1
Postdoc @ AI2 & UW | NLP
https://yanaiela.github.io/
Postdoc at Stanford NLP. Interested in improving compositional generalization, trustworthiness, and data efficiency of language models.
https://robertcsordas.github.io/
LM/NLP/ML researcher Β―\_(γ)_/Β―
yoavartzi.com / associate professor @ Cornell CS + Cornell Tech campus @ NYC / nlp.cornell.edu / associate faculty director @ arXiv.org / researcher @ ASAPP / starting @colmweb.org / building RecNet.io
PhD @Stanford working w Noah Goodman
Studying in-context learning and reasoning in humans and machines
Prev. @UofT CS & Psych
NLP Researcher | CS PhD Candidate @ Technion
Principal Researcher @ Microsoft Research.
Cognitive computational neuroscience & AI.
Writer. Nature wanderer.
www.momen-nejad.org
PhD student in Interpretable Machine Learning at TU Berlin & BIFOLD
Professor, University of TΓΌbingen @unituebingen.bsky.social.
Head of Department of Computer Science π.
Faculty, TΓΌbingen AI Center π©πͺ @tuebingen-ai.bsky.social.
ELLIS Fellow, Founding Board Member πͺπΊ @ellis.eu.
CV π·, ML π§ , Self-Driving π, NLP πΊ
PhD @coastalcph doing NLP things. Ex SWE @Google π«π·π₯ and student @dccuchile π¨π±. I also like sports, beer, reading, and photography.
@guyd33 on the X-bird site. PhD student at NYU, broadly cognitive science x machine learning, specifically richer representations for tasks and cognitive goals. Otherwise found cooking, playing ultimate frisbee, and making hot sauces.
Incoming Associate Professor of Computer Science and Psychology @ Princeton. Posts are my views only. https://cims.nyu.edu/~brenden/
NLP, Linguistics, Cognitive Science, AI, ML, etc.
Job currently: Research Scientist (NYC)
Job formerly: NYU Linguistics, MSU Linguistics
cognitive models of decision making @UCLA
kiantefernandez.com
Researcher @MSFTResearch; Prof @UWMadison (on leave); learning in context; thinking about reasoning; babas of Inez Lily.
https://papail.io
β¨ Comprehensive evaluation of the INTERPLAY between model internals and behavior
β¨ https://interplay-workshop.github.io/
β¨ Submission due June 23rd
β¨ October 10th, @colmweb.org
The largest workshop on analysing and interpreting neural networks for NLP.
BlackboxNLP will be held at EMNLP 2025 in Suzhou, China
blackboxnlp.github.io
I do research on trustworthy NLP, i.e., social + technical aspects of fairness, reasoning, etc.
pronouns: xe/they (Deutsch: keine)
nouns: computer scientist, linguist, birder
adjectives: trans, queer, autistic
https://dippedrusk.com
Professor, University Of Copenhagen π©π° PI @belongielab.org π΅οΈββοΈ Director @aicentre.dk π€ Board member @ellis.eu πͺπΊ Formerly: Cornell, Google, UCSD
#ComputerVision #MachineLearning