Jess Hamrick's Avatar

Jess Hamrick

@jhamrick.bsky.social

Researching planning, reasoning, and RL in LLMs @ Reflection AI. Previously: Google DeepMind, UC Berkeley, MIT. I post about: AI πŸ€–, flowers 🌷, parenting πŸ‘Ά, public transit πŸš†. She/her. http://www.jesshamrick.com

5,791 Followers  |  1,634 Following  |  205 Posts  |  Joined: 11.11.2024  |  1.7813

Latest posts by jhamrick.bsky.social on Bluesky

Also, some people don't have mental imagery at all (aphantasia)! My conclusion based on the evidence is that it's we do some form of latent and/or piecemeal simulation but it's definitely not pixel perfect.

06.10.2025 09:32 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Simulation as an engine of physical scene understanding | PNAS In a glance, we can perceive whether a stack of dishes will topple, a branch will support a child’s weight, a grocery bag is poorly packed and liab...

I used to research intuitive physics. There's examples where people's predictive ability seems quite good (e.g. www.pnas.org/doi/10.1073/...) but also many mistakes and reported limitations (e.g. www.nature.com/articles/s41..., www.sciencedirect.com/science/arti...)

06.10.2025 09:32 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")?

Our new paper shows AI which models others’ minds as Python code πŸ’» can quickly and accurately predict human behavior!

shorturl.at/siUYI%F0%9F%...

03.10.2025 02:24 β€” πŸ‘ 36    πŸ” 14    πŸ’¬ 3    πŸ“Œ 3

This is so so cool! I tried to build an AI system to do a variation of the Finke task waaay back in... 2016 or 2017? It didn't work very well, hah. (It was a combination of Bayesian inference over the structured representation with a CNN recognition model). Amazing that LLMs are able to do this.

02.10.2025 08:38 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Artificial Phantasia: Evidence for Propositional Reasoning-Based Mental Imagery in Large Language Models This study offers a novel approach for benchmarking complex cognitive behavior in artificial systems. Almost universally, Large Language Models (LLMs) perform best on tasks which may be included in th...

Imagine an apple 🍎. Is your mental image more like a picture or more like a thought? In a new preprint led by Morgan McCartyβ€”our lab's wonderful RAβ€”we develop a new approach to this old cognitive science question and find that LLMs excel at tasks thought to be solvable only via visual imagery. 🧡

01.10.2025 01:26 β€” πŸ‘ 112    πŸ” 36    πŸ’¬ 5    πŸ“Œ 8

I think your links got messed up, the paper is here: github.com/NVlabs/RLP/b...

01.10.2025 06:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Nvidia's RLP: Reinforcement Learning Pretrainingβ€”information-driven, verifier-free objective that teaches models to think before they predict

πŸ”₯+19% vs BASE on Qwen3-1.7B
πŸš€+35% vs BASE on Nemotron-Nano-12B

01.10.2025 06:07 β€” πŸ‘ 26    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
Preview
Visualizing the Influence of Federal Funding on the AI Boom The Transformer was invented in Google. Reinforcement Learning with Human Feedback (RLHF) was not invented in industry labs, but is most…

Final blog post on the visualization medium.com/@mark-riedl/...

30.09.2025 18:16 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1

Resharing this, because it's proving valuable enough that I spent 10 minutes looking it up. TLDR: It's true that some famous recent papers in AI were produced in the private sector. But they *cite* lots of papers with academic authors and federal funding.

30.09.2025 17:20 β€” πŸ‘ 31    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Post image

Nice to see another fully open, multimodal LM released! Good license, training code, pretraining data, all here.
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

Slowly, the community is growing.
arxiv.org/abs/2509.236...

30.09.2025 16:03 β€” πŸ‘ 50    πŸ” 9    πŸ’¬ 0    πŸ“Œ 0
Preview
Introducing Claude Sonnet 4.5 Claude Sonnet 4.5 is the best coding model in the world, strongest model for building complex agents, and best model at using computers.

Claude Sonnet 4.5 dropped www.anthropic.com/news/claude-...
assets.anthropic.com/m/12f214efcc...

29.09.2025 18:20 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Preview
How Claude Code is built A rare look into how the new, popular dev tool is built, and what it might mean for the future of software building with AI. Exclusive.

Interesting deep dive for anyone curious about how #ClaudeCode is built.
#Anthropic #Claude #GenAI #DevOps #VibeCoding

29.09.2025 16:12 β€” πŸ‘ 9    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0
Post image

It's aster time! I have never seen so many monarchs, bumble bees and other pollinators in my yard. I had 6 monarchs on my New England aster at one time! Maybe they are on their migration. Anyway, shows how important late season natives are. 🌱 #nativeplants, #pollinators

29.09.2025 18:34 β€” πŸ‘ 107    πŸ” 19    πŸ’¬ 3    πŸ“Œ 0
Post image

I know there is a lot of competition today, but this might be most consequential release for people training models: in-depth exploration of full-finetuning, lora, RL efficiencies by John Schulman (ThinkingMachines). thinkingmachines.ai/blog/lora/

29.09.2025 18:09 β€” πŸ‘ 54    πŸ” 7    πŸ’¬ 3    πŸ“Œ 1
Video thumbnail

πŸŽ‰ 80,000 Green Party members!

πŸ“ˆ But we're not stopping there.

πŸ’š We have no time to waste. Join the Green Party today ‡️

28.09.2025 10:01 β€” πŸ‘ 625    πŸ” 231    πŸ’¬ 19    πŸ“Œ 45

I was part of an interesting panel discussion yesterday at an ARC event. Maybe everybody knows this already, but I was quite surprised by how "general" intelligence was conceptualized in relation to human intelligence and the ARC benchmarks.

28.09.2025 10:06 β€” πŸ‘ 23    πŸ” 3    πŸ’¬ 2    πŸ“Œ 1
Post image Post image

Want to visualize the response format constraints on the LLM when working in a Jupyter notebook?
Then you might be interested in my new project `litelines`.
Litelines lets you visualize the selected path by the LLM.
It supports a Pydantic schema as a response format, as well as regular expressions.

16.09.2025 07:18 β€” πŸ‘ 8    πŸ” 4    πŸ’¬ 1    πŸ“Œ 2

sheesh! AI bluesky has arrived

not just good content, there’s more and more original work, people from labs, and people with genuinely interesting perspectives

when i joined, it was so painful trying to find even traces

27.09.2025 17:56 β€” πŸ‘ 143    πŸ” 9    πŸ’¬ 6    πŸ“Œ 1
Post image

Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning
arxiv.org/abs/2509.13351

27.09.2025 05:07 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

I was wondering about that too...

27.09.2025 15:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
ANY-AI icon: (Any Tool, Any Use, Any Time) to use in policy of GenAI use in a course

ANY-AI icon: (Any Tool, Any Use, Any Time) to use in policy of GenAI use in a course

AT (Approved Tools Only) icon to use in policy of GenAI use in a course

AT (Approved Tools Only) icon to use in policy of GenAI use in a course

UA (Use with Attribution) icon to use in policy of GenAI use in a course

UA (Use with Attribution) icon to use in policy of GenAI use in a course

AS (Assignment-Specific) icon to use in policy of GenAI use in a course

AS (Assignment-Specific) icon to use in policy of GenAI use in a course

For instructors out there: a very cool set of AI Course Policy Icons by Cornell's GenAI taskforce, to be used in combination for your syllabi or assignments.

Inspired by @creativecommons.bsky.social and available w/ a CC license. Sample icons attached here.

teaching.cornell.edu/generative-a...

27.09.2025 13:54 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Why the EU–INC?

Europe has the talent, ambition, and ecosystems to create innovative companies, but fragmentation between European nations is holding us back.

"A startup from California can expand and raise money all across the United States. But our companies still face way too many national barriers that make it hard to work Europa-wide, and way too much regulatory burden."

– Ursula von der Leyen, Oct 2024

Why the EU–INC? Europe has the talent, ambition, and ecosystems to create innovative companies, but fragmentation between European nations is holding us back. "A startup from California can expand and raise money all across the United States. But our companies still face way too many national barriers that make it hard to work Europa-wide, and way too much regulatory burden." – Ursula von der Leyen, Oct 2024

For others like me who might be unsure what this is about www.eu-inc.org has some further details

27.09.2025 13:44 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

EU–INC is the single best thing Europe could do to catch-up in the AI race

A simple unified pan-European startup structure, with modern employee ownership and simple access to capital, able to tap into Europe’s full talent pool.

‼️ but it’s at high risk of not seeing the light of day. You can helpπŸ‘‡

27.09.2025 10:52 β€” πŸ‘ 37    πŸ” 13    πŸ’¬ 3    πŸ“Œ 0
Post image

And new paper out: Pleias 1.0: the First Family of Language Models Trained on Fully Open Data

How we train an open everything model on a new pretraining environment with releasable data (Common Corpus) with an open source framework (Nanotron from HuggingFace).

www.sciencedirect.com/science/arti...

27.09.2025 11:44 β€” πŸ‘ 171    πŸ” 50    πŸ’¬ 8    πŸ“Œ 8

This is a really good thread about forecasting too far into the future for medical AI, and what the β€œwe should stop training doctors/lawyers” crowd is missing.

Tagging for #MedSky #MLSky

27.09.2025 05:24 β€” πŸ‘ 12    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
An Open Letter to U.S. STEM Leadership on the NSF Graduate Research Fellowship Program An Open Letter to U.S. STEM Leadership on the NSF Graduate Research Fellowship Program

Please share with anyone who cares about NSF support for graduate students and take 30 seconds to sign and leave a comment.

The deadline for the 2025 Graduate Research Fellowship Program is about one month away and literally no one can apply. #NSFGRFP

jasonjwilliamsny.github.io/grfp2025/

25.09.2025 22:09 β€” πŸ‘ 223    πŸ” 275    πŸ’¬ 12    πŸ“Œ 67

One example of how government can m, in fact, make people’s lives better if we want it to.

26.09.2025 12:21 β€” πŸ‘ 13    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Today's date, 9/25/2025, is probably the last date of our lifetime where the month, day, and year are all square numbers 😱 The next time this will happen is on 1/1/2116.

25.09.2025 11:34 β€” πŸ‘ 113    πŸ” 43    πŸ’¬ 4    πŸ“Œ 10
Post image

πŸŽ“ What could a pan-European PhD program focusing on #MachineLearning & AI be like?

Check out @wafaamohammed.bsky.social's reason for joining the #ELLISPhD Program.

You can apply to the via our central recruiting portal starting on Oct 1st.

Get all the details now πŸ‘‰ bit.ly/45DSe75

25.09.2025 09:18 β€” πŸ‘ 6    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

For folks considering grad school in ML, my advice is to explore programs that mix ML with a domain interest. ML programs are wildly oversubscribed while a lot of the fun right now is in figuring out what you can do with it

25.09.2025 03:25 β€” πŸ‘ 153    πŸ” 17    πŸ’¬ 8    πŸ“Œ 7

@jhamrick is following 20 prominent accounts