Vaishnavh Nagarajan's Avatar

Vaishnavh Nagarajan

@vaishnavh.bsky.social

Foundations of AI. I like simple and minimal examples and creative ideas. I also like thinking about the next token ๐Ÿงฎ๐Ÿงธ Google Research | PhD, CMU | https://arxiv.org/abs/2504.15266 | https://arxiv.org/abs/2403.06963 vaishnavh.github.io

3,195 Followers  |  381 Following  |  171 Posts  |  Joined: 13.11.2024  |  2.6269

Latest posts by vaishnavh.bsky.social on Bluesky

ICML 2025 AwardsICML 2025

Congratulations to CSD faculty Aditi Raghunathan and her research collaborators on receiving an ICML Outstanding Paper award for Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction (icml.cc/virtual/2025...).

Paper: arxiv.org/abs/2504.15266

17.07.2025 14:43 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

Reading the dedications of a PhD thesis is often a cure for a bad day. Thereโ€™s so much affection in them

03.07.2025 23:05 โ€” ๐Ÿ‘ 43    ๐Ÿ” 1    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0
LLM Policy

As NeurIPS review deadline is around the corner, please remember that you cannot use any non-local LLM like chatgpt/gemini for understanding the paper and drafting/revising your review as that breaks the confidentiality agreement.

NeurIPS 2025 Official LLM Policy:
neurips.cc/Conferences/...

02.07.2025 12:44 โ€” ๐Ÿ‘ 6    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I really enjoyed "When We Cease to Understand the World", although it's more fiction than history of science

22.06.2025 14:30 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

โ€œScience in historyโ€ by Bernal is my first recommendation. The work of Ian Hacking is a good recommendation for
Probability

23.06.2025 02:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Learning dynamics in linear recurrent neural networks Recurrent neural networks (RNNs) are powerful models used widely in both machine learning and neuroscience to learn tasks with temporal dependencies and to model neural dynamics. However, despite...

How do task dynamics impact learning in networks with internal dynamics?

Excited to share our ICML Oral paper on learning dynamics in linear RNNs!
with @clementinedomine.bsky.social @mpshanahan.bsky.social and Pedro Mediano

openreview.net/forum?id=KGO...

20.06.2025 17:28 โ€” ๐Ÿ‘ 33    ๐Ÿ” 12    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

When we are doing science, we are unknowingly executing our mythology, taken from movies and friends and textbooks, of what science is. History of science helps us ground that myth in reality

21.06.2025 23:04 โ€” ๐Ÿ‘ 53    ๐Ÿ” 6    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

I finally wrote a full-fledged blog about this: reading the history of science is an **amazing** yet under-recognized way to develop (emotional) maturity as a researcher.

If you have thoughts/recommendations, please share!
vaishnavh.github.io/2025/04/29/h...

12.06.2025 23:45 โ€” ๐Ÿ‘ 35    ๐Ÿ” 7    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2

haha that's a new idiom for me. it's perfect! and the flip side is, "target and (potentially) regret", which causes quite a lot of stress. (what if your work gets rejected by the community or worse, overlooked)

05.06.2025 15:58 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

but these pressures are real and have always persisted.

</end rant>

I think @abeirami.bsky.social may be interested in this rant.

05.06.2025 15:43 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

but now I've the maturity to seek validation from things like "a specific person complimenting my work" or even better, "a meaningful citation where someone substantially builds on my work." (ofc, i also seek internal validation/satisfaction but I gotta be realistic, lol).

05.06.2025 15:43 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

i had intense first-hand struggle with a lot of these effects in my phd since i had <= 1 paper/year for the most part. i started managing it only after getting visibly recognized by experts for one of my papers at one point. i still struggle with it at some level.

05.06.2025 15:43 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

then there are many other insidious feedback cycles like the fact that publishing more => more visibility => more opportunities/networks/interfaces with the community/more citations => more opportunities/internships etc., => more papers

05.06.2025 15:37 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

for example, with the advent of twitter, there's a pressure to stay constantly visible and to have many different things to say every now and then (bec everyone else is doing that), rather than pitch your one paper again and again which starts feeling awkward :-(

05.06.2025 15:33 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

someday I hope to write a blog about "all the other forces that discourage me" from publishing less. people always say "publish less!" but without acknowledging these varied and nuanced forces

05.06.2025 15:33 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

all other incentivation strategies I had thought of are much more negative/mean. like
- "evaluating someone based on some bottom k papers" or
- "judging negatively for publishing >N papers"

05.06.2025 15:30 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

haha thank you! honored you feel that way!

btw, i just noticed this, this sort of a compliment is actually a great way to incentivize people to be more selective in publishing papers (and to counter all the other forces that discourage me from my rate of ~1 paper a year)

05.06.2025 15:28 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Title of paper "Causal Estimation of Tokenisation Bias" and schematic of how we define tokenisation bias, which is the causal effect we are interested in.

Title of paper "Causal Estimation of Tokenisation Bias" and schematic of how we define tokenisation bias, which is the causal effect we are interested in.

A string may get 17 times less probability if tokenised as two symbols (e.g., โŸจhe, lloโŸฉ) than as one (e.g., โŸจhelloโŸฉ)โ€”by an LM trained from scratch in each situation! Our new ACL paper proposes an observational method to estimate this causal effect! Longer thread soon!

04.06.2025 10:51 โ€” ๐Ÿ‘ 53    ๐Ÿ” 9    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3
Post image

How does in-context learning emerge in attention models during gradient descent training?

Sharing our new Spotlight paper @icmlconf.bsky.social: Training Dynamics of In-Context Learning in Linear Attention
arxiv.org/abs/2501.16265

Led by Yedi Zhang with @aaditya6284.bsky.social and Peter Latham

04.06.2025 11:22 โ€” ๐Ÿ‘ 52    ๐Ÿ” 17    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

This paper is quite nice. It mixes some useful toy models of creativity with insights about how to induce more creativity in LLMs that are better than greedy sampling

02.06.2025 21:31 โ€” ๐Ÿ‘ 16    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction We design a suite of minimal algorithmic tasks that are a loose abstraction of open-ended real-world tasks. This allows us to cleanly and controllably quantify the creative limits of the present-day l...

Read the full paper: arxiv.org/abs/2504.15266

This work was with great collaborators at CMU: @chenhenrywu.bsky.social who co-led, Charles Ding & @adtraghunathan.bsky.social! Go follow them to see what else theyโ€™re up to! 11/

02.06.2025 17:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

But, there's a lot of scope for exciting work:
โ†’ generalizing these insights to real cows,
โ†’ studying RL/CoT for creativity,
โ†’ understanding surprising behaviors of seed-conditioning 10/๐Ÿ‘‡๐Ÿฝ

02.06.2025 17:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Of course, this is all a study of spherical cows. ๐Ÿฎ
Given the noisy, subjective studies of real cows, we believe an objective study brings
โ†’much-needed clarity of thought (like disentangling the two modes of creativity),
โ†’more ideas,
โ†’better-defined experiments. 9/๐Ÿ‘‡๐Ÿฝ

02.06.2025 17:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Our vision is that seed-conditioning can help models sample a latent thought and articulate that one thought into words,

but temp sampling has to articulate multiple latent thoughts in parallel to produce a marginal next-word distribution -- this is more burdensome! 8/๐Ÿ‘‡๐Ÿฝ

02.06.2025 17:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Figure showing algorithmic creativity with and without seed-conditioning.

Figure showing algorithmic creativity with and without seed-conditioning.

Post image

Next, we revisit how to produce randomness: the go-to temp sampling ๐ŸŒก๏ธ vs. injecting a random prefix (seed-conditioning). ๐ŸŒฑ

Remarkably, seed-conditioning produces meaningful diversity even w *greedy* decoding ๐Ÿค‘; it is competitive with temp & in some conditions, superior. 7/๐Ÿ‘‡๐Ÿฝ

02.06.2025 17:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

โ€ฆa (deterministic) many-hop planning task.

our work shows inefficiency in tasks
- as simple as 2-hop (stochastic) planning ๐Ÿ‘ถ๐Ÿฝ
- where no token-reordering is next-token friendly! ๐Ÿ”€๐Ÿ”ƒ๐Ÿ™…๐Ÿพ The model *must* learn all tokens in tandem to detect an *implicit* pattern. 6/๐Ÿ‘‡๐Ÿฝ

02.06.2025 17:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Figure from prior paper showing path-star task, a pitfall of next-token prediction.

Figure from prior paper showing path-star task, a pitfall of next-token prediction.

02.06.2025 17:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We argue: creativity requires appreciating big-picture ๐Ÿ–ผ๏ธ patterns & orchestrating interdependent random decisions in advance ("a leap of thought"). Next-token learning should be inefficient at this.

This complements the known pitfall of next-token learning in the path-star task โ€ฆ 5/๐Ÿ‘‡๐Ÿฝ

02.06.2025 17:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Plots report empirical results of algorithmic creativity under next-token vs multi-token objectives.

Plots report empirical results of algorithmic creativity under next-token vs multi-token objectives.

Post image

On these tasks, we can objectively evaluate how โ€œcreativeโ€ (correct, diverse & original) a model is. ๐Ÿง‘๐Ÿฝโ€๐Ÿ”ฌ

First: Next-token-trained models are largely less creative & memorize much more than multi-token ones (we tried diffusion and teacherless training). 4/๐Ÿ‘‡๐Ÿฝ

02.06.2025 17:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Fig 1 screenshot from paper describing minimal algorithmic tasks for combinational creativity (drawing upon different pieces of memory.)

Fig 1 screenshot from paper describing minimal algorithmic tasks for combinational creativity (drawing upon different pieces of memory.)

Minimal algorithmic tasks describing exploratory creativity (devising fresh patterns subject to some rules.) Screenshot of Fig 2 in the paper.

Minimal algorithmic tasks describing exploratory creativity (devising fresh patterns subject to some rules.) Screenshot of Fig 2 in the paper.

Screenshot from poster describing the two tasks in a less technical way.

Screenshot from poster describing the two tasks in a less technical way.

Our idea was to design minimal, open-ended, graph algorithmic tasks ๐Ÿงฎ abstracting 2 key modes of creativity:

1. Combinational: making surprising connections from memory, like in wordplay.๐Ÿง 

2. Exploratory: devising fresh patterns obeying some rules, like in problem-design๐Ÿงฉ 3/๐Ÿ‘‡๐Ÿฝ

02.06.2025 17:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@vaishnavh is following 20 prominent accounts