TALES: Text Adventure Learning Environment Suite
Reasoning is an essential skill to enable Large Language Models (LLMs) to interact with the world. As tasks become more complex, they demand increasingly sophisticated and diverse reasoning capabiliti...
My former MS student Chris Cui (now PhD student with @rajammanabrolu.bsky.social)motivates Text Adventure Games as testbeds for reasoning. Provides a new benchmark suite of text games. Observes that Zork still kicks LLMβs butts despite training on walkthroughs arxiv.org/abs/2504.14128
03.12.2025 00:42 β
π 21
π 6
π¬ 0
π 0
Ziyi Zhang, Shengqi Li (on PhD app market!) for multi agent D&D sim creation + RL openreview.net/pdf?id=3Op7k...
Me mostly if you want boba and beach recs (or NVIDIA full time RS roles I guess, I'm hiring a few ppl there but not UCSD)
24.11.2025 22:10 β
π 1
π 0
π¬ 0
π 0
Jenny Shen for pluralistic alignment + human feedback arxiv.org/abs/2510.01167
Chris Cui for text sims + RL + scalable oversight of reasoning models arxiv.org/abs/2504.14128
Lucas (on industry market) for how reasoning emerges from mid training/SFT to RL lucasdino.github.io/assets/files...
24.11.2025 22:10 β
π 2
π 0
π¬ 1
π 0
Bosung Kim for all things VLA, embodied AI, and long context memory arxiv.org/abs/2505.16928
Ruiyi Wang for multi turn agentic RL and all the RL infra in and outs arxiv.org/abs/2510.01132
24.11.2025 22:10 β
π 1
π 0
π¬ 1
π 0
My entire PEARLS Lab, and many NVIDIA colleagues, will be at #neurips2025 to chat about their latest. Some papers accepted to the conf are already outdated so just reach out to. Thread π§΅
24.11.2025 22:10 β
π 1
π 0
π¬ 1
π 0
Yay congrats, Mark! Well deserved! It's def a required reading for all things comp storytelling (and creativity!)
12.11.2025 17:22 β
π 1
π 0
π¬ 0
π 0
I am extremely honored and humbled to have been awarded a Test-of-Time award for my 2005 paper "From Linear Story Generation to Branching Story Graphs" with R. Michael Young
12.11.2025 16:08 β
π 72
π 5
π¬ 5
π 1
YouTube video by IVADO
Navigating the Safety-Capability Spectrum when Teaching Agents with Feedback -Prithviraj Ammanabrolu
I've done a few versions of this talk but this is the first that's been recorded publicly, thanks to IVADO Montreal
A good overview of things my lab has been up to in the last year or so at least in balancing safety/capabilities of (embodied) AI Agents
www.youtube.com/watch?v=S-kV...
03.11.2025 23:04 β
π 8
π 1
π¬ 0
π 0
π₯Excited to share our new work: "A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning"!
We study what actually works for agentic multi-turn RL with varying πEnvironment, π€Policy, and βReward.
We conduct various ablations and empirical analysis on π§©TextWorld, π§ALFWorld, and π§βπ»SWE-Gym.
26.10.2025 21:36 β
π 9
π 2
π¬ 1
π 0
My students will be presenting papers next Wed/Thursday so be sure to check those out too
01.10.2025 17:22 β
π 3
π 0
π¬ 1
π 0
I'll be at #CoLM2025 and the IVADO agents workshop right before in Montreal. My students will be presenting two papers in the main conf. I'll also do a ws keynote where I'll talk about some of our latest. Come by and say hi next week!
01.10.2025 17:21 β
π 8
π 0
π¬ 1
π 0
I'm probably mostly going to stop posting on this site. There's close to no engagement and it's not worth the effort to cross post for the amount of time that takes. Find me elsewhere / email me
29.06.2025 20:54 β
π 5
π 0
π¬ 4
π 1
The thing that feels so off about the core tech world is that every convo is very transactional. Maybe true elsewhere too. "Oh you're an expert in RL, can you answer questions about my new startup?"
Every single (Bay) party. No I do not want to consult. I just wanna hang out.
16.06.2025 05:54 β
π 9
π 0
π¬ 1
π 0
Of all the labeling startups out there to acquihire, this was... an interesting choice. Says a lot actually
10.06.2025 17:34 β
π 0
π 0
π¬ 1
π 0
. @bosungkim.bsky.social will be at #CVPR2025 in Nashville this week to present this and just generally talk about scaling memory for embodied agents!
Catch her at the poster sessions and also the Foundation Models meets Embodied Agents Workshop on Wed
09.06.2025 16:56 β
π 2
π 0
π¬ 0
π 0
Yes AI for edu is a thing but almost all vanilla LLMs just railroad students into answers. Complete cognitive offload is not useful for improving learning outcomes
09.06.2025 15:24 β
π 2
π 0
π¬ 0
π 0
I've heard this personally from multiple PMs at AI companies. Students are one of the biggest demographics and they need to "break in" and have even more usage to improve their metrics. Classic corporate economic incentives
09.06.2025 15:24 β
π 7
π 4
π¬ 1
π 0
Tis the era of bringing back every AI benchmark ever but this time by the LLM people and for the LLMs
06.06.2025 22:57 β
π 2
π 1
π¬ 0
π 0
talks.cam : Language Technology Lab Seminars
Had a fun little visit to Cambridge LTL where I talked about a bunch of my lab's latest papers including some still not public with the key takeaway that "RL can absolutely learn new things and is not just resurfacing knowledge"
talks.cam.ac.uk/show/archive...
05.06.2025 17:51 β
π 4
π 1
π¬ 0
π 0
That's fair, I guess I should rephrase to "regardless of a possible common prior, it's nearly impossible for different providers to have the same representations pop out of their post trained LLM"
03.06.2025 04:50 β
π 0
π 0
π¬ 1
π 0
The moral of the story here is basically that who is making your LLM really matters. Internal use cases critical to their businesses will always influence data distributions and everything downstream of that. This is in contrast to things like Platonic Representation Hypothesis
03.06.2025 04:05 β
π 5
π 0
π¬ 1
π 0
YouTube video by Dwarkesh Patel
Xi Jinpingβs paranoid approach to AGI, debt crisis, & Politburo politics β Victor Shih
Interesting tidbit from UCSD's Victor Shih on a podcast talking about Chinese AGI efforts is that Deepseek is good at Chinese govt doc understanding cause that's what affects stock prices most and DS is a hedge fund.
www.youtube.com/watch?v=b1Te...
03.06.2025 04:05 β
π 5
π 0
π¬ 1
π 0
Looks like Gemini gets AIR 6 in #JEE2025 with a score of 323
Only 5 highschoolers in all India do better than an LLM in the single most important exam of their to get into the IITs
The legacy edu selection systems are now worse than useless
02.06.2025 06:54 β
π 4
π 0
π¬ 1
π 1
I get prepping for worst case scenarios but a lot of AI Safety debates I somehow end up these days in boil down to "assume you have Machine God in a box, now tell me how to align it"
I could rant for hours but seriously y'all this isn't productive
01.06.2025 01:47 β
π 8
π 0
π¬ 1
π 0
Here for the afternoon shift!
24.05.2025 21:06 β
π 1
π 0
π¬ 0
π 0