Eric Todd's Avatar

Eric Todd

@ericwtodd.bsky.social

CS PhD Student, Northeastern University - Machine Learning, Interpretability https://ericwtodd.github.io

445 Followers  |  145 Following  |  5 Posts  |  Joined: 19.11.2024  |  2.0315

Latest posts by ericwtodd.bsky.social on Bluesky

We've added a quick new section to this paper, which was just accepted to @COLM_conf! By summing weights of concept induction heads, we created a "concept lens" that lets you read out semantic information in a model's hidden states. πŸ”Ž

22.07.2025 12:39 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Im excited for NEMI again this year! I’ve enjoyed local research meetups and getting to know others near me working on interesting problems.

30.06.2025 23:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
NEMI 2024 (Last Year)

NEMI 2024 (Last Year)

🚨 Registration is live! 🚨

The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University!

A chance for the mech interp community to nerd out on how models really work πŸ§ πŸ€–

🌐 Info: nemiconf.github.io/summer25/
πŸ“ Register: forms.gle/v4kJCweE3UUH...

30.06.2025 22:55 β€” πŸ‘ 10    πŸ” 8    πŸ’¬ 0    πŸ“Œ 2
Post image

How do language models track mental states of each character in a story, often referred to as Theory of Mind?

We reverse-engineered how LLaMA-3-70B-Instruct handles a belief-tracking task and found something surprising: it uses mechanisms strikingly similar to pointer variables in C programming!

24.06.2025 17:13 β€” πŸ‘ 54    πŸ” 19    πŸ’¬ 2    πŸ“Œ 1
Post image

Can we uncover the list of topics a language model is censored on?

Refused topics vary strongly among models. Claude-3.5 vs DeepSeek-R1 refusal patterns:

13.06.2025 15:58 β€” πŸ‘ 8    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

I'm not familiar with the reviewing load for ARR, but for COLM this I was only assigned 2 papers as a reviewer which is great. I had more time to try and understand each submission and it was much more manageable than getting assigned 6+ papers like ICML and NeurIPS do.

29.05.2025 00:14 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I'll present a poster for this work at NENLP tomorrow! Come find me at poster #80...

10.04.2025 21:19 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Sheridan asks whether the Dual Route Model of Reading that psychologists have observed in humans also appears in LLMs.

In her brilliantly simple study of induction heads, she finds that it does! Induction has a Dual Route that separates concepts from literal token processing.

Worth reading β†˜οΈ

07.04.2025 15:23 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

[πŸ“„] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

07.04.2025 13:54 β€” πŸ‘ 73    πŸ” 18    πŸ’¬ 1    πŸ“Œ 5

I reviewed for ICML this year and it felt to me like the paper quality was lower than previous reviewing assignments I’ve had. In my batch I had 3/7 that I’d consider low quality submissions. The review process was also more involved (but hopefully it allows for a better feedback mechanism)

25.03.2025 22:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

What will be the linchpin for AI dominance?

Read our NSF/OSTP recommendations written with Goodfire's Tom McGrath tommcgrath.github.io, Transluce's Sarah Schwettmann cogconfluence.com, MIT's Dylan Hadfield-Menell @dhadfieldmenell.bsky.social

TLDR; Dominance comes from **interpretability** 🧡 β†˜οΈ

16.03.2025 13:57 β€” πŸ‘ 22    πŸ” 8    πŸ’¬ 1    πŸ“Œ 1
Preview
Oxford Word of the Year 2024 - Oxford University Press The Oxford Word of the Year 2024 is 'brain rot'. Discover more about the winner, our shortlist, and 20 years of words that reflect the world.

I'm searching for some comp/ling experts to provide a precise definition of β€œslop” as it refers to text (see: corp.oup.com/word-of-the-...)

I put together a google form that should take no longer than 10 minutes to complete: forms.gle/oWxsCScW3dJU...
If you can help, I'd appreciate your input! πŸ™

10.03.2025 20:00 β€” πŸ‘ 10    πŸ” 8    πŸ’¬ 0    πŸ“Œ 0
Post image

Induction heads are commonly associated with in-context learning, but are they the primary driver of ICL at scale?

We find that recently discovered "function vector" heads, which encode the ICL task, are the actual primary mechanisms behind few-shot ICL!

arxiv.org/abs/2502.14010
πŸ§΅πŸ‘‡

28.02.2025 16:16 β€” πŸ‘ 23    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0
Post image

LLMs are known to perpetuate social biases in clinical tasks. Can we locate and intervene upon LLM activations that encode patient demographics like gender and race? 🧡

Work w/ @arnabsensharma.bsky.social, @silvioamir.bsky.social, @davidbau.bsky.social, @byron.bsky.social

arxiv.org/abs/2502.13319

22.02.2025 04:17 β€” πŸ‘ 11    πŸ” 5    πŸ’¬ 2    πŸ“Œ 1

Please help amplify ARBOR, a fantastic new research opportunity! If you’d like to start contributing, NDIF is now hosting DeepSeek R1 8B and 70B, open for all researchers to experiment on via our API.

Sign up for API access here: login.ndif.us

20.02.2025 22:35 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

I'm excited about this new open research initiative! It kind of feels like this is how science is supposed to be done - collaborating and sharing ideas in the open. If you've thought about studying the mechanisms behind R1 & other reasoning models check it out!

20.02.2025 23:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

DeepSeek R1 shows how important it is to be studying the internals of reasoning models. Try our code: Here @canrager.bsky.social shows a method for auditing AI bias by probing the internal monologue.

dsthoughts.baulab.info

I'd be interested in your thoughts.

31.01.2025 14:30 β€” πŸ‘ 29    πŸ” 9    πŸ’¬ 1    πŸ“Œ 1

What was the most important machine learning paper in 2024?

My Famous Deep Learning Papers list (that I use in teaching) does not include any new ideas from the last year.

papers.baulab.info

Which single new paper would you add?

31.12.2024 15:09 β€” πŸ‘ 56    πŸ” 12    πŸ’¬ 10    πŸ“Œ 0

Yes, I remember learning them with "arc" at the beginning (e.g. arcsin, arccos).

12.12.2024 22:08 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

More big news! Applications are open for the NDIF Summer Engineering Fellowshipβ€”an opportunity to work on cutting-edge AI research infrastructure this summer in Boston! πŸš€

10.12.2024 21:59 β€” πŸ‘ 9    πŸ” 6    πŸ’¬ 1    πŸ“Œ 2

The Phase 2 NDIF Pilot is open for a short window.

Apply now to get research capacity on Llama 405b.

Deadline is December 31.

It is not easy to crack open 405b for research, but NDIF solves the key engineering problems for you. Phase 1 powered several very interesting ICLR submissions...

10.12.2024 07:09 β€” πŸ‘ 15    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0

Do you have a great experiment that you want to run on Llama 405b but not enough GPUs?

🚨 #NDIF is opening up more spots in our 405b pilot program! Apply now for a chance to conduct your own groundbreaking experiments on the 405b model. Details: πŸ§΅β¬‡οΈ

09.12.2024 20:04 β€” πŸ‘ 18    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1
PhD Apply - Khoury College of Computer Sciences

PhD Applicants: remember that the Northeastern Computer Science PhD application deadline is Dec 15.

It's a terrific time to do a PhD, with so many interesting things happening in AI.

Apply here:

www.khoury.northeastern.edu/apply/phd-ap...

07.12.2024 10:31 β€” πŸ‘ 33    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0

New Preprint πŸš€

Can diffusion models draw artistic inspiration from nature? πŸ€”

@huiren and @materzynska trained a diffusion model solely on natural images, excluding artwork from pre-training data ❌🎨

Surprisingly, it can mimic art styles!

Curious how it works?πŸ‘‡πŸ§΅w/ @davidbau and Antonio Torralba

04.12.2024 00:41 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - rhfeiyang/art-free-diffusion: Official implementation of "Art-Free Generative Models: Art Creation Without Graphic Art Knowledge" Official implementation of "Art-Free Generative Models: Art Creation Without Graphic Art Knowledge" - rhfeiyang/art-free-diffusion

Do you need to copy art to make art?

Hui Ren's and Joanna Materzynska's Art-Free Diffusion tests this question and lets you make "imitation-free" AI art

Github: github.com/rhfeiyang/ar...
Arxiv: arxiv.org/abs/2412.00176
Website: joaanna.github.io/art-free-dif...
X: x.com/materzynska/...

04.12.2024 22:04 β€” πŸ‘ 17    πŸ” 2    πŸ’¬ 0    πŸ“Œ 1

@ericwtodd is following 20 prominent accounts