Nick Tomlin's Avatar

Nick Tomlin

@nickatomlin.bsky.social

Incoming assistant professor at TTIC, current faculty fellow at NYU, and previous PhD student at Berkeley. Natural language processing. He/him. 🌐 nickatomlin.github.io

1,709 Followers  |  113 Following  |  11 Posts  |  Joined: 01.08.2023  |  1.725

Latest posts by nickatomlin.bsky.social on Bluesky

TTIC Faculty Opportunities at TTIC

Two brief advertisements!

TTIC is recruiting both tenure-track and research assistant professors: ttic.edu/faculty-hiri...
NYU is recruiting faculty fellows: apply.interfolio.com/174686

Happy to chat with anyone considering either of these options

23.10.2025 13:57 β€” πŸ‘ 8    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0

CRA changed their interface and it's much harder to browse now for some reason...

Last year, I ended up just making a list of schools/departments that I wanted to apply to and individually searching through each of their websites for job postings

12.10.2025 23:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

FYI that UChicago CS & Stats is hiring at all levels via the Data Science Institue:

Postdoc: uchicago.infoready4.com#freeformComp...
Assistant Professor: apply.interfolio.com/174766
Associate Professor: apply.interfolio.com/174768

07.10.2025 17:53 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Preview
What does it take to build a human-like user simulator?

What does it take to build a human-like user simulator? //

Jessy Lin and I wrote another blogpost on user simulators as a reward function for training interactive models, this time focused on methods + open questions:
jessylin.com/2025/09/25/u...

28.09.2025 15:32 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Eugene Vinitsky

Was talking to a student who wasn't sure about why one would get a PhD. So I wrote up a list of reasons!
www.eugenevinitsky.com/posts/reason...

27.07.2025 19:30 β€” πŸ‘ 51    πŸ” 11    πŸ’¬ 7    πŸ“Œ 0
Preview
User simulators bridge RL with real-world interaction

An excellent blog post about a still huge missing gap, models of humans you can actually use to study human-AI interaction: jessylin.com/2025/07/10/u...

10.07.2025 22:15 β€” πŸ‘ 12    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

We’re proud to announce three new tenure-track assistant professors joining TTIC in Fall 2026: Yossi Gandelsman, Will Merrill, and Nick Tomlin (@nickatomlin.bsky.social). Meet them here: buff.ly/JH1DFtT

27.06.2025 16:29 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

πŸ€ πŸ€“πŸ™‚

29.05.2025 04:17 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Haha main reason for using Gym was that we wanted a way to automatically evaluate models against trained RL agents. Doing the full arena-style evaluation on reasoning models gets really expensive

It also helps that current LLMs are really good at generating functional Gym code

14.05.2025 16:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I think in the short term that’s reasonable, e.g., current models can play chess but they definitely can’t understand chess variants

In the long term, I suspect there’s more risk of over-optimizing to those specific games, so the hope is that our approach is a bit more future-proof

14.05.2025 16:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - vivek3141/gg-bench: Measuring General Intelligence With Generated Games (Preprint) Measuring General Intelligence With Generated Games (Preprint) - vivek3141/gg-bench

For anyone interested in evaluating or expanding on this benchmark, we have a nice code release here: github.com/vivek3141/gg...

13.05.2025 21:30 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Results table. The best model (o1) wins about 36% of games against the RL baselines.

Results table. The best model (o1) wins about 36% of games against the RL baselines.

This is a difficult benchmark: the best non-reasoning LLMs score around 9%, while the best reasoning models score around 36%. In the future, as models get stronger, we anticipate that they'll also be able to generate harder games

13.05.2025 21:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Main paper figure showing a three-step pipeline of game description generation, implementation generation, and self-play training of RL agents

Main paper figure showing a three-step pipeline of game description generation, implementation generation, and self-play training of RL agents

We use o1 to generate natural language rulebooks for 1000 two-player games and then implement these games as Gym environments. For each game, we train baseline agents in self-play with RL and then evaluate whether LLMs can beat the RL baselines

13.05.2025 21:30 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Title and abstract of the paper, "Measuring General Intelligence with Generated Games"

Title and abstract of the paper, "Measuring General Intelligence with Generated Games"

I'm particularly fond of this new benchmark paper we wrote, which aims to scalably evaluate whether language models can generalize to arbitrary new tasks. The core idea is to use LLMs to generate new games, and then evaluate whether LLMs can play those games

πŸ“„: arxiv.org/abs/2505.07215

13.05.2025 21:30 β€” πŸ‘ 33    πŸ” 9    πŸ’¬ 3    πŸ“Œ 1

I might be able to hire a postdoc for this fall in computational linguistics at UT Austin. Topics in the general LLM + cognitive space (particularly reasoning, chain of thought, LLMs + code) and LLM + linguistic space. If this could be of interest, feel free to get in touch!

21.04.2025 15:56 β€” πŸ‘ 60    πŸ” 31    πŸ’¬ 0    πŸ“Œ 1

Writing my first post here to announce that I've accepted an assistant professor job at TTIC! I'll be starting in Fall 2026, and recruiting students this upcoming cycle.

Until then, I'll be wrapping up the PhD at Berkeley, and this summer I'll join NYU as a CDS Faculty Fellow πŸ™οΈ

15.04.2025 03:34 β€” πŸ‘ 41    πŸ” 2    πŸ’¬ 3    πŸ“Œ 2

@nickatomlin is following 20 prominent accounts