Sheridan Feucht's Avatar

Sheridan Feucht

@sfeucht.bsky.social

PhD student doing LLM interpretability with @davidbau.bsky.social and @byron.bsky.social. (they/them) https://sfeucht.github.io

217 Followers  |  323 Following  |  25 Posts  |  Joined: 23.11.2024  |  2.2795

Latest posts by sfeucht.bsky.social on Bluesky

Placeholders for 3 students (number arbitrarily chosen) and me - to signify my eventual group!

Placeholders for 3 students (number arbitrarily chosen) and me - to signify my eventual group!

Looking forward to attending #cogsci2025 (Jul 29 - Aug 3)! I’m especially excited to meet students who will be applying to PhD programs in Computational Ling/CogSci in the coming cycle.

Please reach out if you want to meet up and chat! Email is the best way, but DM also works if you must!

quick🧡:

28.07.2025 21:20 β€” πŸ‘ 21    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0
Preview
Google Colab

Try it out in our new paper demo notebook! Or ping me with any sequence to try and I'd be more than happy to run a few examples for you.
colab.research.google.com/github/sfeuc...

Also check out the new camera-ready version of the paper on arXiv.
arxiv.org/abs/2504.03022

22.07.2025 12:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
"Token lens" outputs for the token "card" in the context "in the morning air, she heard northern card.inals."

"Token lens" outputs for the token "card" in the context "in the morning air, she heard northern card.inals."

If we do the same for token induction heads, we can also get a "token lens", which reads out surface-level token information from states. Unlike raw logit lens, which reveals next-token predictions, "token lens" reveals the current token.

22.07.2025 12:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Three "concept lens" outputs, showing the top-5 highest probability tokens when a hidden state (throughout different layers) is transformed by concept lens and projected to token space. There are three sentences, each with different predictions: "he was a lifelong fan of the cardinals", for which concept lens predicts "football" and "baseball"; "the secret meeting of the cardinals", for which concept lens predicts "Catholic"; and "in the morning air, she hear northern cardinals", which projects to "birds."

Three "concept lens" outputs, showing the top-5 highest probability tokens when a hidden state (throughout different layers) is transformed by concept lens and projected to token space. There are three sentences, each with different predictions: "he was a lifelong fan of the cardinals", for which concept lens predicts "football" and "baseball"; "the secret meeting of the cardinals", for which concept lens predicts "Catholic"; and "in the morning air, she hear northern cardinals", which projects to "birds."

If we apply concept lens to the word "cardinals" in three contexts, we see that Llama-2-7b has encoded this word very differently in each case!

22.07.2025 12:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

To do this, we sum the OV matrices of the top-k concept induction heads, and use it to transform a hidden state at a particular token position. Projecting that to vocab space with the model's decoder head, we can access the "meaning" encoded in that state.

22.07.2025 12:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We've added a quick new section to this paper, which was just accepted to @COLM_conf! By summing weights of concept induction heads, we created a "concept lens" that lets you read out semantic information in a model's hidden states. πŸ”Ž

22.07.2025 12:39 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
NEMI 2024 (Last Year)

NEMI 2024 (Last Year)

🚨 Registration is live! 🚨

The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University!

A chance for the mech interp community to nerd out on how models really work πŸ§ πŸ€–

🌐 Info: nemiconf.github.io/summer25/
πŸ“ Register: forms.gle/v4kJCweE3UUH...

30.06.2025 22:55 β€” πŸ‘ 10    πŸ” 8    πŸ’¬ 0    πŸ“Œ 1

Nikhil's recent paper is a tour de force in causal analysis! They show that LLMs keep track of what characters know in a story using "pointer" mechanisms. Definitely worth checking out.

24.06.2025 17:48 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

I'm on the train right now and just finished reading this paper for the first time--I actually just logged back on to bsky just so that I could link to it, but you beat me to the punch!

I really enjoyed your paper. This example was particularly great.

25.04.2025 20:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Sheridan Feucht Solving Syllogisms is Not Intelligence April 23, 2025 (I think that we overvalue logical reasoning when it comes to measuring "intelligence.") What do we mean by intelligence in the context of cogniti...

I used to think formal reasoning was central to language and intelligence, but now I’m not so sure. Wrote a short post about my thoughts on this, with a couple chewy anecdotes. Would love to get some feedback or pointers to further reading.
sfeucht.github.io/syllogisms/

25.04.2025 15:39 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I'll present a poster for this work at NENLP tomorrow! Come find me at poster #80...

10.04.2025 21:19 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

That’s a good point! Sort of related, I noticed last night that when I have to type in a 2FA code I usually compress the numbers. Like if the code is 51692 I think β€œfifty-one, sixty-nine, two.” I wonder if this is a thing that people have studied. Thanks for the comment :)

09.04.2025 00:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
The Dual-Route Model of Induction Prior work on in-context copying has shown the existence of induction heads, which attend to and promote individual tokens during copying. In this work we introduce a new type of induction head: conce...

Paper: arxiv.org/abs/2504.03022
Code: github.com/sfeucht/dual...

See dualroute.baulab.info for more info. Work done with @ericwtodd.bsky.social, @byron.bsky.social, and @davidbau.bsky.social. :)

07.04.2025 13:57 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Yin & Steinhardt (2025) recently showed that FV heads are more important for ICL than token induction heads. But for translation, *concept* induction heads matter too! They copy forward word meanings, whereas FV heads influence the output language.
bsky.app/profile/kay...

07.04.2025 13:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Concept heads also output language-agnostic word representations. If we patch the outputs of these heads from one translation prompt to another, we can change the *meaning* of the outputted word, without changing the language. (see prior work from @butanium.bsky.social and @wendlerc.bsky.social)

07.04.2025 13:54 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
Post image

Token induction heads are still important, though. When we ablate them over long sequences, models start to paraphrase instead of copying. We take this to mean that token induction heads are responsible for *exact* copying (which concept induction heads apparently can't do).

07.04.2025 13:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

But how do we know these heads copy semantics? When we ablate concept induction heads, performance drops drastically for translation, synonyms, and antonyms: all tasks that require copying *meaning*, not just literal tokens.

07.04.2025 13:54 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Previous work showed that token induction heads attend to the next token to be copied (*window*pane). Analogously, we find that concept induction heads attend to the end of the next multi-token word to be copied (windowp*ane*).

07.04.2025 13:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

--using causal interventions. Essentially, we pick out all of the attention heads that are responsible for promoting future entity tokens (e.g. "ax" in "waxwing"). We hypothesize that heads carrying an entire entity actually represent the *meaning* of that chunk of tokens.

07.04.2025 13:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Induction heads were discovered by Elhage et al. (2021) and Olsson et al. (2022). They focused on token copying, but some of the heads they found also seemed to activate for "fuzzy" copying tasks, like translation. We directly identify these heads--
transformer-circuits.pub/2022/in-con...

07.04.2025 13:54 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

There are multiple ways to copy text! Copying a wifi password like hxioW2qN52 is different than copying a meaningful one like OwlDoorGlass. Nonsense copying requires each char to be transferred one-by-one, but meaningful words can be copied all at once. Turns out, LLMs do both.

07.04.2025 13:54 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
Post image

[πŸ“„] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

07.04.2025 13:54 β€” πŸ‘ 77    πŸ” 20    πŸ’¬ 1    πŸ“Œ 6

So gorgeous, is this in Cambridge?

01.04.2025 23:29 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Looks really cool! Can’t wait to give this a proper read.

12.03.2025 13:38 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Oxford Word of the Year 2024 - Oxford University Press The Oxford Word of the Year 2024 is 'brain rot'. Discover more about the winner, our shortlist, and 20 years of words that reflect the world.

I'm searching for some comp/ling experts to provide a precise definition of β€œslop” as it refers to text (see: corp.oup.com/word-of-the-...)

I put together a google form that should take no longer than 10 minutes to complete: forms.gle/oWxsCScW3dJU...
If you can help, I'd appreciate your input! πŸ™

10.03.2025 20:00 β€” πŸ‘ 10    πŸ” 8    πŸ’¬ 0    πŸ“Œ 0

I like this work a lot. Racism+misogyny in medicine is genuinely dangerous, so it's really important to keep tabs on model biases if we're going to use LLMs in clinical settings. It's nice to see that interpretability techniques are useful here.

22.02.2025 22:33 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Do you have a great experiment that you want to run on Llama 405b but not enough GPUs?

🚨 #NDIF is opening up more spots in our 405b pilot program! Apply now for a chance to conduct your own groundbreaking experiments on the 405b model. Details: πŸ§΅β¬‡οΈ

09.12.2024 20:04 β€” πŸ‘ 18    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1

Love the Gabriel Garcia Marquez quote at the beginning. On my reading list!

30.11.2024 01:41 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
A box with anpan, melonpan, a strawberry croissant, and a matcha adzuki cream puff.

A box with anpan, melonpan, a strawberry croissant, and a matcha adzuki cream puff.

Japonaise Bakery in Brookline :) πŸ₯

24.11.2024 20:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Cat sitting on a chair in front of a parked black car with its rear wheel removed and a hydraulic jack supporting it

Cat sitting on a chair in front of a parked black car with its rear wheel removed and a hydraulic jack supporting it

yes, this is what mechanistic interpretability research looks like

24.11.2024 19:51 β€” πŸ‘ 23    πŸ” 2    πŸ’¬ 2    πŸ“Œ 1

@sfeucht is following 20 prominent accounts