Arnab Sen Sharma arnabsensharma

Humans and LLMs think fast and slow. Do SAEs recover slow concepts in LLMs? Not really.

Our Temporal Feature Analyzer discovers contextual features in LLMs, that detect event boundaries, parse complex grammar, and represent ICL patterns.

13.11.2025 22:31 — 👍 19 🔁 8 💬 1 📌 1

Thanks to my collaborators Giordano Rogers, @natalieshapira.bsky.social, and @davidbau.bsky.social .

Checkout our paper for more details:

📜 arxiv.org/pdf/2510.26784
💻 github.com/arnab-api/fi...
🌐 filter.baulab.info

04.11.2025 17:56 — 👍 4 🔁 0 💬 0 📌 0

The fact that the neural mechanisms implemented in transformer architecture align with human-designed symbolic strategies suggests that certain computational patterns rise naturally from task demands rather than specific architectural constraints.

04.11.2025 17:48 — 👍 2 🔁 0 💬 1 📌 0

This dual implementation of filtering: lazy evaluation via filter heads and eager evaluation by storing intermediate flags, echoes the lazy vs eager evaluation strategies in functional programming patterns.

Check Henderson & Morris Jr (1976): dl.acm.org/doi/abs/10....

04.11.2025 17:48 — 👍 2 🔁 0 💬 1 📌 0

This seemingly innocent change in the prompt order fundamentally changes what strategy is used by the LLMs. This suggests that LLMs can maintain multiple strategies for the same task, and flexibly switch/prioritize them based on information availability.

04.11.2025 17:48 — 👍 2 🔁 0 💬 1 📌 0

We validate this flag-based eager evaluation hypothesis with a series of carefully designed causal analysis. If we swap this flag with another item, in the question-before context the LM consistently picks the item carrying the flag. However, the question-after is not sensitive to this.

04.11.2025 17:48 — 👍 2 🔁 0 💬 1 📌 0

🎭 Plot twist: when the question is presented *before* the options, the causality scores drops to near zero!

We investigate this further and find that when the question is presented first, the LM can can *eagerly* evaluate each option as it sees them, and store a "flag" directly in the latents.

04.11.2025 17:48 — 👍 2 🔁 0 💬 1 📌 0

🔄 The predicate can also be transferred (to some extent) across different tasks - suggesting the LLMs rely on shared representations and mechanisms that are reused across tasks.

Also checkout @jackmerullo.bsky.social's work on LLM's reusing sub-circuits in different tasks.
x.com/jack_merull...

04.11.2025 17:48 — 👍 2 🔁 0 💬 1 📌 0

Sheridan Feucht (@sfeucht.bsky.social) [📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

Language-independent predicates resembles the cross-lingual concepts seen in prior works by @sfeucht.bsky.social, @wendlerc.bsky.social, and @jannikbrinkmann.bsky.social.
bsky.app/profile/sfe...

04.11.2025 17:48 — 👍 2 🔁 0 💬 1 📌 0

When the question is presented *after* the options, filter heads can achieve high causality scores across language and format changes! This suggests that the encoded predicate is robust against such perturbations.

04.11.2025 17:48 — 👍 2 🔁 0 💬 1 📌 0

We test this across a range of different semantic types, presentation formats, languages, and even different tasks that require a different "reduce" step after filtering.

04.11.2025 17:48 — 👍 2 🔁 0 💬 1 📌 0

📊 We measure this with a *causality* score: if the predicate is abstractly encoded in the query states of these "filter heads", then transferring it should change the output. For example: in the figure the answer should change to "Peach" (or the changed format accordingly).

04.11.2025 17:48 — 👍 2 🔁 0 💬 1 📌 0

🤔 But do these heads play a *causal* role in the operation?

To test them, we transport their query states from one context to another. We find that will trigger the execution of the same filtering operation, even if the new context has a new list of items and format!

04.11.2025 17:48 — 👍 3 🔁 1 💬 1 📌 0

🔍 In Llama-70B and Gemma-27B, we found special attention heads that consistently focus their attention on the filtered items. This behavior seems consistent across a range of different formats and semantic types.

04.11.2025 17:48 — 👍 3 🔁 0 💬 1 📌 0

We want to understand how large language models (LLMs) encode "predicates". Is every filtering question, e.g., find the X that satisfies property P, handled in a different way? Or has the LM learned to use abstract rules that can be reused in many different situations?

04.11.2025 17:48 — 👍 3 🔁 0 💬 1 📌 0

How can a language model find the veggies in a menu?

New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options.

Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵

04.11.2025 17:48 — 👍 24 🔁 9 💬 1 📌 2

How do language models track mental states of each character in a story, often referred to as Theory of Mind?

We reverse-engineered how LLaMA-3-70B-Instruct handles a belief-tracking task and found something surprising: it uses mechanisms strikingly similar to pointer variables in C programming!

24.06.2025 17:13 — 👍 59 🔁 19 💬 2 📌 1

More big news! Applications are open for the NDIF Summer Engineering Fellowship—an opportunity to work on cutting-edge AI research infrastructure this summer in Boston! 🚀

10.12.2024 21:59 — 👍 9 🔁 6 💬 1 📌 2

Posts by Arnab Sen Sharma (@arnabsensharma.bsky.social)