Thank you! :)
03.03.2026 06:56 β π 1 π 0 π¬ 1 π 0Thank you! :)
03.03.2026 06:56 β π 1 π 0 π¬ 1 π 0But I'll forever be grateful for the privilege of being a part of DM through such an exciting time, for getting to work on many amazing projects, and for the wonderful collaborators and dear friends I've made along the way.
03.03.2026 01:55 β π 11 π 0 π¬ 1 π 0With all these changes, I've started to wonder if it would be easier to more effectively do the work that I think is most important and exciting somewhere else. After a short break, I'm excited to try something new (more to come soon, I hope).
03.03.2026 01:55 β π 15 π 0 π¬ 1 π 0View of London from a rooftop in Kings Cross
After 5.5 years (or 7 or 9, counting internships), today was my last day at Google/DeepMind. When I was in London recently, I walked through the two floors that were (most of) DeepMind when I first joined, and thought about how much the company and field have changed since then.
03.03.2026 01:55 β π 66 π 0 π¬ 2 π 0π¨New preprint! In-context learning underlies LLMsβ real-world utility, but what are its limits? Can LLMs learn completely novel representations in-context and flexibly deploy them to solve tasks? In other words, can LLMs construct an in-context world model? Letβs see! π
26.02.2026 17:28 β π 37 π 5 π¬ 1 π 1Really cool work β learning over sequential experiences that contain the embodied cue of viewpoint as well as visual inputs, can give rise to human-like 3D shape perception!
26.02.2026 16:52 β π 10 π 1 π¬ 1 π 0
News! I've joined the Astera Institute to lead its neuroscience based AGI research. Backed by $1B+ commitment over the coming decade, my team will explore novel, brain-inspired architectures and algos toward safe, efficient human-like AGI, working alongside Doris Tsao. 1/
astera.org/dileep-georg...
That kind of structural generalization to entirely new situations seems hard to obtain from simpler models (without building the abstraction in a priori), even if it *is* ultimately a consequence of the same sort of simplicity-bias processes at play in the cases above. 3/3
25.02.2026 17:54 β π 1 π 0 π¬ 0 π 0the type of generalization you can observe. E.g. if you train an LM-type (passive-learning) agent on causal tasks, you get emergent generalization to infer and exploit novel causal dependencies never seen in training (proceedings.neurips.cc/paper_files/...). 2/3
25.02.2026 17:54 β π 1 π 0 π¬ 1 π 0Right, there are several things to distinguish here. Completely agree that benign-overfitting and simplicity-bias phenomena are not unique to DL models. But I think the fact that DL models can represent a much broader (and more abstract) solution class than simpler models qualitatively changes 1/3
25.02.2026 17:54 β π 1 π 0 π¬ 1 π 0perhaps it would depend on what exactly the generalization kernel is...
24.02.2026 21:17 β π 2 π 0 π¬ 1 π 0Yeah that's a great connection! Although I think that DL models transition smoothly to higher-order structural-type generalization (e.g., learning a truly novel task in context) that doesn't seem as obviously capturable through exemplar-based models as I think of them, though 1/2
24.02.2026 21:17 β π 3 π 0 π¬ 1 π 0Awesome, will check it out, thanks!
24.02.2026 00:20 β π 1 π 0 π¬ 0 π 0Yes, I'd agree with that statement! (And glad to hear it :) )
24.02.2026 00:20 β π 3 π 0 π¬ 1 π 0show how actively augmenting data can improve certain kinds of generalization, and in the end of arxiv.org/abs/2509.16189 we suggest it might have something to do with how offline replay helps natural intelligences β more to come on that soon I hope :)
23.02.2026 19:14 β π 4 π 0 π¬ 1 π 0Nevertheless, it seems like these models are not as sample efficient in the small-sample novel-task regime as humans; e.g., learning a new game from scratch. One reason may be that natural intelligences do more with each experience, including inferring beyond it; in arxiv.org/abs/2505.00661 we 2/3
23.02.2026 19:14 β π 2 π 0 π¬ 1 π 0
Hmm, well maybe we should discuss offline sometime :)
My quick answer would be that if anything, scale seems to *improve* generalization at the same time as reducing interference (e.g., see Fig. 2 in arxiv.org/abs/2001.08361 β larger models reduce test loss faster from the same amount of data). 1/2
How do you get it to build cumulatively on what it's learned, so that new learnings can build on older ones? I think ARC-AGI-3 tasks, for example, capture these kinds of challenges very well. I'm not saying interference is totally solved, TBC β just I think these other challenges are less solved 2/2
23.02.2026 18:19 β π 2 π 0 π¬ 1 π 0I'd say it's more a concern about learning efficiency, positive transfer (rather than negative, i.e. interference), and integration. How do you get a system to learn something efficiently and consolidate that in a way that makes necessary connections to prior knowledge? 1/2
23.02.2026 18:19 β π 4 π 0 π¬ 1 π 0Hmmm the link doesn't seem to work, what's the title?
23.02.2026 15:32 β π 0 π 0 π¬ 1 π 0Good point! I hadn't made that connection before
23.02.2026 00:38 β π 0 π 0 π¬ 1 π 0Hmm well I think images have quite a bit higher information density. Though you still definitely see memorization in diffusion models. But it's probably important that their output objective is reproduction, whereas humans are using visual inputs at a higher level of abstraction usually...
23.02.2026 00:35 β π 4 π 0 π¬ 1 π 0
lags humans; lots more work to be done. Indeed, the simplicity biases I discuss here can sometimes be counterproductive, see e.g.: proceedings.neurips.cc/paper/2020/h...
But my main point here is that even for present systems, memorization doesn't necessarily prevent generalization. 2/2
Definitely, though it's worth noting that natural systems don't necessarily always generalize perfectly either, e.g. proceedings.mlr.press/v162/guo22d.... β it's just much harder to optimize through them to directly find small attacks. But certainly there are ways that generalization of AI 1/2
21.02.2026 18:20 β π 2 π 1 π¬ 1 π 02) not considering some fast recovery based on memory (arxiv.org/abs/1802.10542) which is almost certainly part of the natural intelligence solution to the problem. Not to say CI/CL are completely solved, but I think the actual problems are quite different than we used to imagine they were. 3/3
21.02.2026 15:55 β π 5 π 0 π¬ 1 π 0I should write a post on this at some point too, but I actually think the catastrophic forgetting/continual learning area were led astray for a long time by 1) neglecting how much model scale reduces catastrophic interference (e.g. openreview.net/forum?id=GhV...) and 2/3
21.02.2026 15:55 β π 4 π 0 π¬ 1 π 0Hmmm maybe for humans, but for current AI it actually doesn't seem that you need to work all that hard to preserve memory for earlier information. I mean, it does definitely degrade to some extent, but the fact that there are even memorization concerns is precisely because a lot is preserved. 1/3
21.02.2026 15:55 β π 4 π 0 π¬ 1 π 0I also think humans will memorize given not-that-many exposures of course, even if they continue to generalize beyond it; memorization as a kind of caching makes sense for almost any efficiency-constrained system. 2/2
20.02.2026 18:17 β π 3 π 0 π¬ 1 π 0Interesting question β I suppose one might be able to craft a meta-learning objective to encourage this, but with the usual learning objectives there's never a cost to memorizing (aside from regularization of course, but in practice it doesn't seem to prevent for large models). 1/2
20.02.2026 18:17 β π 2 π 0 π¬ 1 π 0memorizing a particular paragraph doesn't really require storing that much information. I often think about the findings on chunking in experts like chess grandmasters allowing much easier recall of board states, and then imagine that LLMs are likewise expert text processors. 2/2
18.02.2026 17:55 β π 3 π 0 π¬ 1 π 0