Andrew Lampinen's Avatar

Andrew Lampinen

@lampinen.bsky.social

Interested in cognition and artificial intelligence. Research Scientist at Google DeepMind. Previously cognitive science at Stanford. Posts are mine. lampinen.github.io

7,811 Followers  |  711 Following  |  310 Posts  |  Joined: 25.08.2023
Posts Following

Posts by Andrew Lampinen (@lampinen.bsky.social)

Post image

🚨New preprint! In-context learning underlies LLMs’ real-world utility, but what are its limits? Can LLMs learn completely novel representations in-context and flexibly deploy them to solve tasks? In other words, can LLMs construct an in-context world model? Let’s see! πŸ‘€

26.02.2026 17:28 β€” πŸ‘ 37    πŸ” 5    πŸ’¬ 1    πŸ“Œ 1

Really cool work β€” learning over sequential experiences that contain the embodied cue of viewpoint as well as visual inputs, can give rise to human-like 3D shape perception!

26.02.2026 16:52 β€” πŸ‘ 10    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Dileep George joins Astera to lead its neuro-inspired AGI effort Dileep George is joining Astera as Head of AI, leading our AGI research division. Working alongside our Chief Scientist Doris Tsao, he and the team will explore novel, brain-inspired computational arc...

News! I've joined the Astera Institute to lead its neuroscience based AGI research. Backed by $1B+ commitment over the coming decade, my team will explore novel, brain-inspired architectures and algos toward safe, efficient human-like AGI, working alongside Doris Tsao. 1/

astera.org/dileep-georg...

25.02.2026 19:10 β€” πŸ‘ 78    πŸ” 6    πŸ’¬ 13    πŸ“Œ 2

That kind of structural generalization to entirely new situations seems hard to obtain from simpler models (without building the abstraction in a priori), even if it *is* ultimately a consequence of the same sort of simplicity-bias processes at play in the cases above. 3/3

25.02.2026 17:54 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Passive learning of active causal strategies in agents and language models

the type of generalization you can observe. E.g. if you train an LM-type (passive-learning) agent on causal tasks, you get emergent generalization to infer and exploit novel causal dependencies never seen in training (proceedings.neurips.cc/paper_files/...). 2/3

25.02.2026 17:54 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Passive learning of active causal strategies in agents and language models

Right, there are several things to distinguish here. Completely agree that benign-overfitting and simplicity-bias phenomena are not unique to DL models. But I think the fact that DL models can represent a much broader (and more abstract) solution class than simpler models qualitatively changes 1/3

25.02.2026 17:54 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

perhaps it would depend on what exactly the generalization kernel is...

24.02.2026 21:17 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Yeah that's a great connection! Although I think that DL models transition smoothly to higher-order structural-type generalization (e.g., learning a truly novel task in context) that doesn't seem as obviously capturable through exemplar-based models as I think of them, though 1/2

24.02.2026 21:17 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Awesome, will check it out, thanks!

24.02.2026 00:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yes, I'd agree with that statement! (And glad to hear it :) )

24.02.2026 00:20 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences When do machine learning systems fail to generalize, and what mechanisms could improve their generalization? Here, we draw inspiration from cognitive science to argue that one weakness of parametric m...

show how actively augmenting data can improve certain kinds of generalization, and in the end of arxiv.org/abs/2509.16189 we suggest it might have something to do with how offline replay helps natural intelligences β€” more to come on that soon I hope :)

23.02.2026 19:14 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
On the generalization of language models from in-context learning and finetuning: a controlled study Large language models exhibit exciting capabilities, yet can show surprisingly narrow generalization from finetuning. E.g. they can fail to generalize to simple reversals of relations they are trained...

Nevertheless, it seems like these models are not as sample efficient in the small-sample novel-task regime as humans; e.g., learning a new game from scratch. One reason may be that natural intelligences do more with each experience, including inferring beyond it; in arxiv.org/abs/2505.00661 we 2/3

23.02.2026 19:14 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Scaling Laws for Neural Language Models We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, wit...

Hmm, well maybe we should discuss offline sometime :)
My quick answer would be that if anything, scale seems to *improve* generalization at the same time as reducing interference (e.g., see Fig. 2 in arxiv.org/abs/2001.08361 β€” larger models reduce test loss faster from the same amount of data). 1/2

23.02.2026 19:14 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

How do you get it to build cumulatively on what it's learned, so that new learnings can build on older ones? I think ARC-AGI-3 tasks, for example, capture these kinds of challenges very well. I'm not saying interference is totally solved, TBC β€” just I think these other challenges are less solved 2/2

23.02.2026 18:19 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I'd say it's more a concern about learning efficiency, positive transfer (rather than negative, i.e. interference), and integration. How do you get a system to learn something efficiently and consolidate that in a way that makes necessary connections to prior knowledge? 1/2

23.02.2026 18:19 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Hmmm the link doesn't seem to work, what's the title?

23.02.2026 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Good point! I hadn't made that connection before

23.02.2026 00:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Hmm well I think images have quite a bit higher information density. Though you still definitely see memorization in diffusion models. But it's probably important that their output objective is reproduction, whereas humans are using visual inputs at a higher level of abstraction usually...

23.02.2026 00:35 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
The Pitfalls of Simplicity Bias in Neural Networks

lags humans; lots more work to be done. Indeed, the simplicity biases I discuss here can sometimes be counterproductive, see e.g.: proceedings.neurips.cc/paper/2020/h...
But my main point here is that even for present systems, memorization doesn't necessarily prevent generalization. 2/2

21.02.2026 18:20 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Adversarially trained neural representations may already be as robust as corresponding biological neural representations Visual systems of primates are the gold standard of robust perception. There is thus a general belief that mimicking the neural representations that underlie those systems will yield artificial vis...

Definitely, though it's worth noting that natural systems don't necessarily always generalize perfectly either, e.g. proceedings.mlr.press/v162/guo22d.... β€” it's just much harder to optimize through them to directly find small attacks. But certainly there are ways that generalization of AI 1/2

21.02.2026 18:20 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Memory-based Parameter Adaptation Deep neural networks have excelled on a wide range of problems, from vision to language and game playing. Neural networks very gradually incorporate information into weights as they process data, requ...

2) not considering some fast recovery based on memory (arxiv.org/abs/1802.10542) which is almost certainly part of the natural intelligence solution to the problem. Not to say CI/CL are completely solved, but I think the actual problems are quite different than we used to imagine they were. 3/3

21.02.2026 15:55 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Effect of scale on catastrophic forgetting in neural networks Catastrophic forgetting presents a challenge in developing deep learning models capable of continual learning, i.e. learning tasks sequentially. Recently, both computer vision and natural-language...

I should write a post on this at some point too, but I actually think the catastrophic forgetting/continual learning area were led astray for a long time by 1) neglecting how much model scale reduces catastrophic interference (e.g. openreview.net/forum?id=GhV...) and 2/3

21.02.2026 15:55 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Hmmm maybe for humans, but for current AI it actually doesn't seem that you need to work all that hard to preserve memory for earlier information. I mean, it does definitely degrade to some extent, but the fact that there are even memorization concerns is precisely because a lot is preserved. 1/3

21.02.2026 15:55 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I also think humans will memorize given not-that-many exposures of course, even if they continue to generalize beyond it; memorization as a kind of caching makes sense for almost any efficiency-constrained system. 2/2

20.02.2026 18:17 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Interesting question β€” I suppose one might be able to craft a meta-learning objective to encourage this, but with the usual learning objectives there's never a cost to memorizing (aside from regularization of course, but in practice it doesn't seem to prevent for large models). 1/2

20.02.2026 18:17 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

memorizing a particular paragraph doesn't really require storing that much information. I often think about the findings on chunking in experts like chess grandmasters allowing much easier recall of board states, and then imagine that LLMs are likewise expert text processors. 2/2

18.02.2026 17:55 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I think repeated data are definitely a contributor in some cases. But also I think we really underestimate how large the capacity of high-dimensional neural networks is (parameters ~ embedding size^2) and how compressible text is: if you already know a lot about language and the content area, 1/2

18.02.2026 17:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Thanks Anna, glad to hear it! :)

18.02.2026 17:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Awesome, glad to hear it β€” that was my goal!

18.02.2026 16:46 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I discuss the evolution from classic ideas about overfitting towards our more modern understanding of implicit inductive biases, benign overfitting, cases where memorization is necessary, and studies on language models.

I hope this will be useful! Thoughts/comments welcome.

18.02.2026 15:54 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 3    πŸ“Œ 0