My former MS student Chris Cui (now PhD student with @rajammanabrolu.bsky.social)motivates Text Adventure Games as testbeds for reasoning. Provides a new benchmark suite of text games. Observes that Zork still kicks LLM’s butts despite training on walkthroughs arxiv.org/abs/2504.14128
Ziyi Zhang, Shengqi Li (on PhD app market!) for multi agent D&D sim creation + RL openreview.net/pdf?id=3Op7k...
Me mostly if you want boba and beach recs (or NVIDIA full time RS roles I guess, I'm hiring a few ppl there but not UCSD)
Jenny Shen for pluralistic alignment + human feedback arxiv.org/abs/2510.01167
Chris Cui for text sims + RL + scalable oversight of reasoning models arxiv.org/abs/2504.14128
Lucas (on industry market) for how reasoning emerges from mid training/SFT to RL lucasdino.github.io/assets/files...
Bosung Kim for all things VLA, embodied AI, and long context memory arxiv.org/abs/2505.16928
Ruiyi Wang for multi turn agentic RL and all the RL infra in and outs arxiv.org/abs/2510.01132
My entire PEARLS Lab, and many NVIDIA colleagues, will be at #neurips2025 to chat about their latest. Some papers accepted to the conf are already outdated so just reach out to. Thread 🧵
Yay congrats, Mark! Well deserved! It's def a required reading for all things comp storytelling (and creativity!)
I am extremely honored and humbled to have been awarded a Test-of-Time award for my 2005 paper "From Linear Story Generation to Branching Story Graphs" with R. Michael Young
I've done a few versions of this talk but this is the first that's been recorded publicly, thanks to IVADO Montreal
A good overview of things my lab has been up to in the last year or so at least in balancing safety/capabilities of (embodied) AI Agents
www.youtube.com/watch?v=S-kV...
🔥Excited to share our new work: "A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning"!
We study what actually works for agentic multi-turn RL with varying 🌎Environment, 🤖Policy, and ⭐Reward.
We conduct various ablations and empirical analysis on 🧩TextWorld, 🧙ALFWorld, and 🧑💻SWE-Gym.
My students will be presenting IRPO arxiv.org/abs/2504.15477 and a paper on Personalized RLHF arxiv.org/abs/2504.07070 on Wed onwards
My students will be presenting papers next Wed/Thursday so be sure to check those out too
I'll be at #CoLM2025 and the IVADO agents workshop right before in Montreal. My students will be presenting two papers in the main conf. I'll also do a ws keynote where I'll talk about some of our latest. Come by and say hi next week!
I'm probably mostly going to stop posting on this site. There's close to no engagement and it's not worth the effort to cross post for the amount of time that takes. Find me elsewhere / email me
I recently left Mosaic/Databricks Research. It's been a ride building out the RL team from <4 ppl to 20+ across two companies & acquisition +figuring out RL as a Service in prod. Mosaic had insane talent density
Some "relaxation" while I put out Prof fires for a smol bit then new adventures!
If you work in the intersection of NLP and games/narrative, then this workshop is for you! wordplay-workshop.github.io/cfp/
Organized by the amazing @laramartin.net and @rajammanabrolu.bsky.social (among others)
The thing that feels so off about the core tech world is that every convo is very transactional. Maybe true elsewhere too. "Oh you're an expert in RL, can you answer questions about my new startup?"
Every single (Bay) party. No I do not want to consult. I just wanna hang out.
Of all the labeling startups out there to acquihire, this was... an interesting choice. Says a lot actually
. @bosungkim.bsky.social will be at #CVPR2025 in Nashville this week to present this and just generally talk about scaling memory for embodied agents!
Catch her at the poster sessions and also the Foundation Models meets Embodied Agents Workshop on Wed
Yes AI for edu is a thing but almost all vanilla LLMs just railroad students into answers. Complete cognitive offload is not useful for improving learning outcomes
I've heard this personally from multiple PMs at AI companies. Students are one of the biggest demographics and they need to "break in" and have even more usage to improve their metrics. Classic corporate economic incentives
Tis the era of bringing back every AI benchmark ever but this time by the LLM people and for the LLMs
Had a fun little visit to Cambridge LTL where I talked about a bunch of my lab's latest papers including some still not public with the key takeaway that "RL can absolutely learn new things and is not just resurfacing knowledge"
talks.cam.ac.uk/show/archive...
That's fair, I guess I should rephrase to "regardless of a possible common prior, it's nearly impossible for different providers to have the same representations pop out of their post trained LLM"
The moral of the story here is basically that who is making your LLM really matters. Internal use cases critical to their businesses will always influence data distributions and everything downstream of that. This is in contrast to things like Platonic Representation Hypothesis
Interesting tidbit from UCSD's Victor Shih on a podcast talking about Chinese AGI efforts is that Deepseek is good at Chinese govt doc understanding cause that's what affects stock prices most and DS is a hedge fund.
www.youtube.com/watch?v=b1Te...
The top two scores are 332 not 322 but other than the typo the rest of this list seems legit and consistent across multiple sources
www.indiatvnews.com/education/hi...
x.com/RejaullahmdM...
Looks like Gemini gets AIR 6 in #JEE2025 with a score of 323
Only 5 highschoolers in all India do better than an LLM in the single most important exam of their to get into the IITs
The legacy edu selection systems are now worse than useless
I get prepping for worst case scenarios but a lot of AI Safety debates I somehow end up these days in boil down to "assume you have Machine God in a box, now tell me how to align it"
I could rant for hours but seriously y'all this isn't productive
Here for the afternoon shift!
Paper: arxiv.org/abs/2505.16928
Website/code/data:
pearls-lab.github.io/infini-thor/
Led by @bosungkim.bsky.social who has done a fantastic job on this in the last bit. Full stack from Unity gamedev to Big Model Scaler. Watch out for her in the embodied agent space!