Alexander Huth @alexanderhuth

📣 New preprint from the Braga Lab! 📣

The ventral visual stream for reading converges on the transmodal language network

Congrats to Dr. Joe Salvo for this epic set of results

Big Q: What brain systems support the translation of writing to concepts and meaning?

Thread 🧵 ⬇️

07.10.2025 21:51 — 👍 56 🔁 15 💬 2 📌 1

Sintel - Wikipedia

Always include some stimuli that are permissively licensed so they can be used as examples! E.g. we have a video stimulus set that's mostly Pixar short films, but also includes a segment from the Blender movie Sintel (en.wikipedia.org/wiki/Sintel), which is licensed CC-BY.

01.10.2025 17:40 — 👍 2 🔁 0 💬 0 📌 0

Happy and proud to see @rjantonello.bsky.social’s work awarded by SNL!

13.09.2025 21:47 — 👍 29 🔁 4 💬 1 📌 0

Mining the neuroimaging literature New tools for literature mining, such as automated analysis of the research literature, are accessible, scalable, and reliable.

Our latest paper outlining our ecosystem of tools for mining the neuroimaging literature, is finally officially published in eLife! doi.org/10.7554/eLif...

11.09.2025 22:24 — 👍 18 🔁 4 💬 1 📌 0

Wang_Lung_1921's comment on "Regardless of your stance on the politics behind the Welsh/LGBT situation…" Explore this conversation and more from the aggies community

Someone on the aggies subreddit posted the course description and it seems pretty thorough! www.reddit.com/r/aggies/s/C...

10.09.2025 06:47 — 👍 4 🔁 0 💬 0 📌 0

New Open dataset alert:
🧠 Introducing "Spacetop" – a massive multimodal fMRI dataset that bridges naturalistic and experimental neuroscience!

N = 101 x 6 hours each = 606 functional iso-hours combining movies, pain, faces, theory-of-mind and other cognitive tasks!

🧵below

04.09.2025 19:21 — 👍 116 🔁 58 💬 3 📌 3

Evaluating scientific theories as predictive models in language neuroscience Modern data-driven encoding models are highly effective at predicting brain responses to language stimuli. However, these models struggle to explain the underlying phenomena, i.e. what features of the...

And of course check out the paper! www.biorxiv.org/content/10.1...

18.08.2025 18:33 — 👍 1 🔁 0 💬 0 📌 0

or @csinva.bsky.social: bsky.app/profile/csin...

18.08.2025 18:33 — 👍 0 🔁 0 💬 1 📌 0

I'm posting this thread to highlight some things I thought cool, but if you're interested you should also check out what @rjantonello.bsky.social wrote: bsky.app/profile/rjan...

18.08.2025 18:33 — 👍 0 🔁 0 💬 1 📌 0

Cortical weight maps were also reasonably correlated between ECoG and fMRI data, at least for the dimensions well-captured in the ECoG coverage.

18.08.2025 18:33 — 👍 0 🔁 0 💬 1 📌 0

Finally, we tested whether the same interpretable embeddings could also be used to model ECoG data from Nima Mesgarani's lab. Despite the fact that our features are less well-localized in time than LLM embeddings, this still works quite well!

18.08.2025 18:33 — 👍 0 🔁 0 💬 1 📌 0

To validate the maps we get from this model we also compared them to expectations derived from NeuroSynth and results from experiments targeting specific semantic categories, and also looked at inter-subject reliability. All quite successful.

18.08.2025 18:33 — 👍 0 🔁 0 💬 1 📌 0

The model and experts were well-aligned, but there were some surprises, like "Does the input include technical or specialized terminology?" (32), which was much more important than expected.

18.08.2025 18:33 — 👍 0 🔁 0 💬 1 📌 0

This method lets us quantitatively assess how much variance different theories explain about brain responses to natural language. So to figure out how well this aligns with what scientists think, we polled experts to see which questions/theories they thought would be important.

18.08.2025 18:33 — 👍 0 🔁 0 💬 1 📌 0

"Does the input include dialogue?" (27) has high weights in a smattering of small regions in temporal cortex. And "Does the input contain a negation?" (35) has high weights in anterior temporal lobe and a few prefrontal areas. I think there's a lot of drilling-down we can do here.

18.08.2025 18:33 — 👍 0 🔁 0 💬 1 📌 0

Left hemisphere cortical flatmap showing regression weights for the feature "Does the input describe a visual experience or scene?"

The fact that each dimension in the embedding thus corresponds to a specific question means that the encoding model weights are interpretable right out-of-the-box. "Does the input describe a visual experience?" has high weight all along the boundary of visual cortex, for example.

18.08.2025 18:33 — 👍 0 🔁 0 💬 1 📌 0

But the wilder thing is how we get the embeddings: by just asking LLMs questions. Each theory is cast as a yes/no question. We then have GPT-4 answer each question about each 10-gram in our natural language dataset. We did this for ~600 theories/questions.

18.08.2025 18:33 — 👍 0 🔁 0 💬 1 📌 0

Average test encoding performance across cortex for the QA model and baselines on the three original subjects (20 hours of fMRI data each) and 5 additional subjects (5 hours each). The 35-question QA model outperformed the state-of-the-art black-box model (which uses hidden representations from the LLaMA family of LLMs) by 12.0% when trained on all the story data. The model’s compactness yields greater relative improvements in data-limited scenarios; when trained on only 5 stories per subject it outperforms the baseline LLaMA model by 43.3%.

And it works REALLY well! Prediction performance for encoding models is on a par with uninterpretable Llama3 embeddings! Even with just 35 dimensions!!! I find this fairly wild.

18.08.2025 18:33 — 👍 0 🔁 0 💬 1 📌 0

New paper with @rjantonello.bsky.social @csinva.bsky.social, Suna Guo, Gavin Mischler, Jianfeng Gao, & Nima Mesgarani: We use LLMs to generate VERY interpretable embeddings where each dimension corresponds to a scientific theory, & then use these embeddings to predict fMRI and ECoG. It WORKS!

18.08.2025 18:33 — 👍 16 🔁 8 💬 1 📌 0

In our new paper, we explore how we can build encoding models that are both powerful and understandable. Our model uses an LLM to answer 35 questions about a sentence's content. The answers linearly contribute to our prediction of how the brain will respond to that sentence. 1/6

18.08.2025 09:44 — 👍 25 🔁 9 💬 1 📌 1

The preprint of my 1st project in grad school is up 🙌 We propose a simple, information-theoretic model of how humans remember narratives. We tested it with the help of open-source LLMs. Plz check out this thread for details ➡️ Many thanks to my wonderful advisors! It's been a fun adventure!!

01.08.2025 17:21 — 👍 16 🔁 4 💬 2 📌 0

Efficient uniform sampling explains non-uniform memory of narrative stories Humans do not remember all experiences uniformly. We remember certain moments better than others, and central gist better than detail. Current theories focus exclusively on surprise to explain why som...

So if you're interested in human memory for complex stimuli, please check the paper out! www.biorxiv.org/content/10.1...

01.08.2025 16:54 — 👍 5 🔁 0 💬 0 📌 0

This work was a really fun departure for me. Nothing data-driven (and no fMRI!), we just sat down and devised a theory, then tested it. It feels surprisingly good :D

01.08.2025 16:54 — 👍 6 🔁 0 💬 1 📌 1

Our model also has interesting linguistic consequences. Speech tends to have uniform information density over time, but there are local variations. We argue that these variations (at least around event boundaries) are actually in service of more uniform _memory_ density.

01.08.2025 16:54 — 👍 1 🔁 0 💬 1 📌 0

We also devised a new way to model and study gist with LLMs, which is to measure (or manipulate) the entropy of attention weights for specific "induction" heads within the LLM. Higher entropy attention weights more evenly sample information from the input, and lead to gist-like behavior.

01.08.2025 16:54 — 👍 2 🔁 0 💬 1 📌 0

Figure 5: Participants vary along gist vs. detail at similar levels of efficiency. a. Based on rate-distortion theory, rate is the number of resources one spent on memorizing the stimuli. Distortion is the deviation between one’s memory and the stimuli. With less efficient memory (larger alpha in our model), one needs larger rate to achieve the same level of distortion. At each level of efficiency, varying the sampling rate g trades off rate with distortion. b. We prompted an LLM to generate recalls of the same stories heard by our participants. c. To manipulate g in LLMs, we modified the attention weights from each token in the recall to the story using additive smoothing, while leaving other attention weights intact. The degree of smoothing is controlled by an “attention temperature” parameter, where a value of 0 indicates no smoothing, and higher temperatures make the attention more uniform. d. Rate-distortion plots of individual human recalls (blue) and model-generated recalls averaged within each attention temperature (orange-pink). Error bars for model-generated recalls indicate the standard error of all recalls for an attention temperature. We operationalized rate as the mutual information between the recall and the story, and distortion as Levenshtein distance. Human recalls are colored by the mean attention entropy of an induction head from the recall to the story, indicating the level of detail (darker) vs. gist (lighter). Model-generated recalls are colored by the attention temperature used for generation. For both humans and the LLM, more detailed recalls have lower distortion but higher rate. Gray line represents simulated recalls of an individual with no knowledge of the English language, corresponding to the worst-case alpha in CRUISE.

Our information-theoretic approach, which relies heavily on LLMs to measure mutual information & entropy of text, also explains memory for gist. For what is gist but the information that is shared across an entire narrative? We argue that low sampling rates lead to gist-like recall.

01.08.2025 16:54 — 👍 3 🔁 0 💬 1 📌 0

Figure 3: Uniform incremental sampling predicts memory at event boundaries. a. We divided stories into windows of equal length in tokens, minimally adjusted to phrase boundaries. The number of windows is 1.5 times the number of events. “Boundary windows” contain at least one event boundary; “inner” windows do not. b. Boundary and inner windows do not differ in surprisal (p > 0.2, shading indicates 95% confidence intervals. Colors correspond to different stories.) c. Boundary windows have significantly longer duration (p = 5.85 ⇥ 1010), d. significantly lower speech rate (p = 5.14 ⇥ 109), e.significantly lower information rate (p = 2.83 ⇥ 105), f. and are significantly better recalled than inner windows (p = 0.043). g. CRUISE predicts that boundary windows are significantly better recalled than inner windows (p = 0.015). h. R2 of predicting mean recall using linear regression. Crosshatched bars show models with explicit event boundary information. Boundaries do not explain additional information on top of CRUISE. i. Schematic of information properties at event boundaries. Surprisal remains constant but speech rate decreases, lowering the information rate. The amount of shared information increases, enabling better memory for boundaries with constant memory encoding rate. j.-l. Average surprisal, speech rate, and information rate around event boundaries in 500 ms bins. Shaded areas indicate standard error. Boundary (time 0) occurs at the right edge of the center bin. Mutual information estimates are not meaningful at the word level, so shared information and memory rate were not computed. At event boundaries, j. Surprisal remains constant, k. speech rate decreases, and l. information rate decreases.

Excitingly, our model also predicts and explains why we have better memory for event boundaries: boundaries tend to have more shared information! (There's also a really interesting effect of speech rate around event boundaries, more on that in a moment..)

01.08.2025 16:54 — 👍 2 🔁 0 💬 1 📌 0

Figure 2: Humans uniformly sample information in time. a.-c. Models predicting the mean amount of information participants’ recall captured about each equal-duration text window using a. CRUISE, b.surprisal, as predicted by event segmentation theory, and c. surprisal-weighted sampling. Shaded areas indicate 95% confidence intervals. Each marker is one story window, with different colors for different stories. d. R2 of models predicting the mean amount of information participants’ recall captured about each equal-duration window using linear regression. Each predictor is fitted in a separate linear regression model. Each story is allowed to have its own slope. CRUISE predicts memory the best, followed by surprisal-weighted sampling. e. Significance testing of differences between models. Colors indicate the number of stories (out of 8) in which the row predictor predicts significantly more participants’ recalls than the column predictor. Stars indicate the significance of the second-level binomial test: whether the number of stories in which the row predictor significantly better predicted more participants than the column predictor is greater than chance. CRUISE significantly outperforms all other models.

Using data from a behavioral experiment where participants listened to stories and then recalled them afterwards we found that our model, constant rate uniform information sampling for encoding (CRUISE), explains variation in memory MUCH better than surprisal or other alternative models.

01.08.2025 16:54 — 👍 3 🔁 0 💬 1 📌 0

Constant rate sampling still manifests as some parts of a narrative being better remembered than others, though! This happens because narratives are not random, so some information is shared across different parts of a story. Parts with more shared information tend to be better remembered.

01.08.2025 16:45 — 👍 3 🔁 0 💬 1 📌 0

New paper with @mujianing.bsky.social & @prestonlab.bsky.social! We propose a simple model for human memory of narratives: we uniformly sample incoming information at a constant rate. This explains behavioral data much better than variable-rate sampling triggered by event segmentation or surprisal.

01.08.2025 16:45 — 👍 51 🔁 18 💬 1 📌 3

Alexander Huth

Latest posts by alexanderhuth.bsky.social on Bluesky

@alexanderhuth is following 20 prominent accounts