Hahaha much appreciated
22.09.2025 21:47 β π 0 π 0 π¬ 0 π 0
Even comparing my own work in different areas; it's harder to be both timely and as through with LM works, especially with the scale of experiments
22.09.2025 19:46 β π 2 π 0 π¬ 1 π 0
I was gonna say, I feel attacked by this tweet π
22.09.2025 19:44 β π 1 π 0 π¬ 2 π 0
We think this work sheds light on why retrieval offers distinct benefits beyond just training models more, and provides a different perspective on why episodic memory and parametric learning are complementary, which we hope will be of interest for both AI and cognitive science 8/
22.09.2025 04:21 β π 3 π 0 π¬ 1 π 0
In the paper, we explore many more settings & nuances β including RL and BC versions of maze navigation experiments based on the original experiments on latent learning in rats, the effects of associative cues, the importance of within-episode ICL, and ablations. 7/
22.09.2025 04:21 β π 3 π 0 π¬ 1 π 0
The benefits of oracle retrieval on the (a) Codebooks and (b) simple reversals benchmarks. Both baseline and retrieval models perform well on component tasks like recalling definitions, or encoding new sequences involving indices used in encoding during training (a, center). However, performance differs dramatically on the latent encoding test (right bars on both plots), where only the model with retrieval achieves above-chance performance.
We show that even when models generalize well from parametric learning in standard (nontrivial) evaluations, there are selective, consistent failures of latent learning. Only models with retrieval generalize well on the key tests of latent learning. 6/
22.09.2025 04:21 β π 3 π 0 π¬ 1 π 0
The benchmarks we use and the key types of latent generalization that they test. (a) The codebooks benchmark tests the ability to use latent indices (highlighted in red) for which only the definitions have been seen in training to complete test encoding sequences. (b) The simple reversals benchmark tests the ability of models to reverse relations seen in training, and which models have learned to reverse in-context. (c) The semantic structure benchmark uses training embedded in more naturalistic text to test latent generalization types ranging from reversals to syllogisms, or more challenging category-inclusion-only holdouts. (d) The latent gridworldβwith both its pixel-based RL and ASCII-based BC instantiationsβtests the ability to navigate to objects that have never been a navigation goal in training for a particular maze, but have been frequently seen.
To illustrate this point, we explore latent learning across a wide range of benchmarks (from codebook translation to BC and RL navigation) β and compare baseline language models or agents to those equipped with oracle retrieval. 5/
22.09.2025 04:21 β π 4 π 0 π¬ 1 π 0
Explicit retrieval of learning experiences from nonparametric learning systems complements the broader knowledge of parametric learningβby making select, relevant experiences available in context where they can be more flexibly used in ways different from the original task setting in which they were encountered.
But models can readily use latent information in their context. We therefore suggest that natural intelligence solves the latent learning problem via the complementary strengths of episodic memory: reinstating experiences into context makes latent information accessible. 4/
22.09.2025 04:21 β π 5 π 1 π¬ 1 π 0
While a model may be trained on some explicit information (e.g. X is Y's teacher" or goals (e.g. navigate to Z), there may be other information latent in it (such as a reversal "Y is X's teacher).
Challenges of reversal are one instance of the much broader phenomenon that what is explicitly learned may also latently convey information relevant to other tasksβe.g., multi-hop reasoning,
alternative goals, or answering questions in other languages. Like the reversal curse, learning on such sequences may primarily improve performance on the explicit information or goals; however, if the sequence were in context, models would readily be able to make inferences about the latent information.
we argue that parametric learning methods are too tied to the explicit training task, and fail to effectively encode latent information relevant to possible future tasks, and we suggest that this explains a wide range of findings, from navigation to the reversal curse. 3/
22.09.2025 04:21 β π 5 π 0 π¬ 2 π 0
We take inspiration from classic experiments on latent learning in animals, where the animals learn about information that is not useful at present, but that might be useful later β for example, learning the location of useful resources in passing. By contrast, 2/
22.09.2025 04:21 β π 5 π 0 π¬ 1 π 0
How can an imitative model like an LLM outperform the experts it is trained on? Our new COLM paper outlines three types of transcendence and shows that each one relies on a different aspect of data diversity. arxiv.org/abs/2508.17669
29.08.2025 21:45 β π 95 π 17 π¬ 3 π 4
When we've compared these in past work e.g. Supplement fig. A.13 here proceedings.neurips.cc/paper/2020/h... we've seen pretty similar results between the two. Haven't run it in exactly this setting though. There are also some arguments that 1/2
05.08.2025 20:18 β π 2 π 0 π¬ 1 π 0
even though both are linearly decodable and equally predictive. Katherine's paper studies some instances more thoroughly in simple settings. My sense though is that the magnitude of these effects are quite a bit smaller than the base bias, so probably not a huge issue if datasets aren't tiny. 2/2
05.08.2025 18:28 β π 1 π 0 π¬ 0 π 0
I don't know of any reviews unfortunately! Fig. 16 in our TMLR paper (openreview.net/forum?id=aY2...) shows an instance β training classifiers on the penultimate reps to decode a label predicted by both easy and hard features; at high predictivity the classifier prefers the easy feature, even 1/2
05.08.2025 18:28 β π 2 π 0 π¬ 1 π 0
Thanks, glad you like it!
05.08.2025 17:49 β π 1 π 0 π¬ 1 π 0
just by dimensionality arguments (input dim 64 << first rep 256) even before training *any* function of the inputs will likely be computable from that rep with a sufficiently complex nonlinear decoder β even features like XOR that the model is *incapable* of computing at the first layer. 2/2
05.08.2025 16:30 β π 2 π 0 π¬ 1 π 0
On the Foundations of Shortcut Learning
Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on \emph{predictivity} -- how reliably a feature indicates training-set labels --...
Good Q: it clearly helps with that concern! But 1) variance biases still affect what nonlinear decoders will learn from finite data (cf. availability effects here arxiv.org/abs/2310.16228). 2) there's also a concern of "overestimating" what is represented. E.g. in our models, 1/2
05.08.2025 16:29 β π 3 π 0 π¬ 1 π 0
Thoughts and feedback are very welcome btw β there are lots of subtle issues in this space that I probably haven't addressed perfectly, and probably prior works that I've missed.
05.08.2025 14:47 β π 1 π 0 π¬ 0 π 0
Thanks to my co-authors @scychan.bsky.social, Effie Li & Katherine Hermann and and the (many) others Iβve discussed these issues with recently and over the past few years!
05.08.2025 14:36 β π 1 π 0 π¬ 2 π 0
These kinds of cases definitely donβt mean studying representations is useless! But they do suggest we may achieve incomplete understanding if weβre not careful. See the paper (arxiv.org/abs/2507.22216) and our prior work (bsky.app/profile/lamp...) for further discussion, caveats, etc.
05.08.2025 14:36 β π 6 π 0 π¬ 2 π 0
Homomorphic encryption: strongly dissociating computation from patterns of representation
In the experiments described above, the role that the representations played in the computations of
the system was relatively straightforward, even where the representations were biased. However, this does not have to be the case. We illustrate this with a final case
study of the possibility for strong dissociation between computation and patterns of representation:
homomorphic encryption (Gentry, 2009; Van Dijk et al., 2010). While the field of cryptography is
largely focused on creating representations that preserve information yet are not easily decodable,
in homomorphic encryption schemes it is additionally possible to perform arbitrary computations
(any algebraic circuit) over this information while it is encrypted. That is, at each step of such a computation, a new encrypted representation is produced that corresponds to the result of encrypting
the representation at that step of the original computation.
This example shows that it is not necessary for a computational system to have any straightforward (e.g. linearly decodable) representation of the features that it uses in its computations. Systematic
computations can be performed even over representations that are deliberately crafted to thwart
attempts to understand (decrypt) their content.
As a special case, this also illustrates that systematic compositional computations are
possible without requiring representations that are straightforwardly compositional. Encrypted
representations are compositional only in the sense that βwith the right highly-nonlinear decoding
scheme compositional representations can be extractedββwhich is also true of some coding schemes
typically interpreted as non-compositional, such as idiosyncratic representations of each input. This
raises questions about if and when it is feasible to rigorously confirm whether a systemβs computations
are compositional from representational analyses.
We also present a worst-case study I find conceptually interesting: homomorphic encryption. Itβs possible to do systematic computation over representations whose content is always encrypted, and thus difficult to decode by design!
05.08.2025 14:36 β π 7 π 0 π¬ 2 π 0
Why are representations biased towards easier features? The biases are driven by multiple factors, including learning dynamics and the different ways that nonlinear features can be represented. (Left) By manipulating training order (training the hard task first rather than both simultaneously), the magnitude of the biases can be reduced. (Right) Likewise, by accounting for the fact that there can be more ways to represent a nonlinear feature that are not linearly equivalentβfor example, different ways of drawing intermediate classification boundaries to compute an XOR functionβwe can identify other components of the representations that may be contributing to the modelβs computation of the hard feature. Together the learning dynamics and multiple ways of representing features explain most of the representation bias towards the easy feature over the hard.
We briefly discuss (some of) the origins of these biases β they are driven by both learning dynamics and the fact that there are in some sense a larger variety of βnaturalβ ways to represent a nonlinear feature.
05.08.2025 14:36 β π 1 π 0 π¬ 1 π 0
RSA within and between different sets of models can give surprising results due to representation biases. This plot shows similarities within and between different models computing different types of features. Ideally the similarities would be highest in blocks on the diagonal (i.e. models computing the same features), and the blocks off the diagonal would show graded similarity corresponding to the functional overlap. However, that is not the case. (Left) When comparing a model trained to output both easy and hard features to ones that are trained on only one feature, the multi-task model appears very similar to the easy-task only model (cf. Hermann and Lampinen, 2020). In fact, the models trained only on the hard task do not even appear particularly similar to other models trained on the same exact task. (Right) When models are trained on multiple easy or multiple hard tasks, the models trained on only hard tasks appear less similar to other models trained on exactly the same tasks than they do to models trained on strictly easier tasks that use the same input units.
These biases can lead to dramatic downstream effects that cause unexpected conclusions from analyses. For example, RSA may identify two models computing the same, complex task as much less representationally-similar than either of them is to a model computing a much simpler task (right panel)!
05.08.2025 14:36 β π 6 π 1 π¬ 1 π 0
Representational biases: in representations of a model computing easy (linear) and hard (4-parity) features, the overall variance explained in the last layer representations by the easy feature is over 55%, while the variance explained by the hard feature is around 5%. This is reflected in the top PCs clearly clustering by the easy feature but not reflecting the hard, and these biases are also present the unit level (almost all units, especially the most active ones), represent the easy feature more strongly.
Representations were systematically biased towards certain kinds of features. For example, a model reliably computing easy (linear) and hard (nonlinear) features has 55% repr. variance explained by the easy one, 5% by the hard, with similar biases in top PCs and individual units.
05.08.2025 14:36 β π 2 π 0 π¬ 1 π 0
Datasets where many input features (color, shape, texture, β¦ size) are fed in to creating input data. Linear or nonlinear classification tasks can be created, e.g. classifying whether an object is a circle (linear) or whether it is XOR(yellow, checkered) which is nonlinear.
Experiments: training neural networks to output multiple features computed from an input, e.g. a linear and nonlinear one.
Learned representations: stimuli presented and datasets of representational activity from the model, as might be collected in a neuroscience experiment.
We constructed controlled datasets with many input features, and trained deep learning models to compute functions of those features (e.g. linear ones like identifying a feature, or nonlinear ones like XOR). We then analyzed the patterns of representational activity they learned.
05.08.2025 14:36 β π 4 π 0 π¬ 1 π 0
Hacker, Computational Neuroscience, ML beyond logistic regression, bear and muscle spindle aficionado. Passionate about open source. #deeplabcut and see https://mathislab.org for more.
NYT bestselling author of EMPIRE OF AI: empireofai.com. ai reporter. national magazine award & american humanist media award winner. words in The Atlantic. formerly WSJ, MIT Tech Review, KSJ@MIT. email: http://karendhao.com/contact.
NeuroAI Scholar @ CSHL
https://darsnack.github.io β¨
Previously maintaining FluxML to procrastinateβ¨β¨Previously EE PhD at UW-Madison, comp. eng. / math at Rose-Hulman
developmental cognitive scientist & (terminated) NSF postdoc fellow @ NYU | social categories, development, language, climbing, caving | π½π¬οΈππ½ | she/ε₯Ή
mariannazhang.github.io
Political Communication Professor at GWU. I write a lot about the history and future of tech and politics. Best known for that one time I made fun of Bret Stephens.
Davekarpf.substack.com
Cognitive scientist and psycholinguist. Currently doing a PhD at Stanford.
Senior Research Scientist at Google DeepMind. Views my own.
Post-doctoral research fellow in cognitive neuroscience (Oxford), interested in complex systems and in simple systems who believe they are complex systems
interests: software, neuroscience, causality, philosophy | ex: salk institute, u of washington, MIT | djbutler.github.io
Cognitive and perceptual psychologist, industrial designer, & electrical engineer. Assistant Professor of Industrial Design at University of Illinois Urbana-Champaign. I make neurally plausible bio-inspired computational process models of visual cognition.
PhD @Stanford working w Noah Goodman
Studying in-context learning and reasoning in humans and machines
Prev. @UofT CS & Psych
https://unireps.org
Discover why, when and how distinct learning processes yield similar representations, and the degree to which these can be unified.
Scientific AI/ machine learning, dynamical systems (reconstruction), generative surrogate models of brains & behavior, applications in neuroscience & mental health
4th-year PhD candidate in neuroAI @ Harvard with Talia Konkle and George Alvarez. Vision, DNNs, fMRI, behavior. Previously TarrLab @ CMU. NDSEG Fellow.
A latent space odyssey
gracekind.net
Postdoc at MIT. Research: language, the brain, NLP.
jmichaelov.com
Computational neuroscientist at Imperial College. I like spikes and making science better (Neuromatch, Brian spiking neural network simulator, SNUFA annual workshop on spiking neurons).
π§ͺ https://neural-reckoning.org/
π· https://adobe.ly/3On5B29