Life update! Excited to announce that I’ll be starting as an assistant professor at Cornell Info Sci in August 2026! I’ll be recruiting students this upcoming cycle!
An abundance of thanks to all my mentors and friends who helped make this possible!!
New ICLR blogpost! 🎉 We argue that understanding the impact of anthropomorphic AI is critical to understanding the impact of AI.
The Model Context Protocol is cool because it gives external developers a way to add meaningful functionality on top of LLM platforms.
To limit test this, I made a "Realtime Voice" MCP using free STT, VAD, and TTS systems. The result is a janky, but makes me me excited about the ecosystem to come!
Look who we found hanging out in her new Stanford Gates Computer Science office!
We’re truly delighted to welcome @yejinchoinka.bsky.social as a new @stanfordnlp.bsky.social faculty member, starting full-time in September. ❤️
nlp.stanford.edu/people/
#WomensHistoryMonth: Honoring trailblazing #WomenOfAI whose research has made an impact on the current #AI/ML revolution incl. @anima-anandkumar.bsky.social @timnitgebru.bsky.social @mmitchell.bsky.social @deviparikh.bsky.social @ajlunited.bsky.social @yejinchoinka.bsky.social @drfeifei.bsky.social
Stanford scholars introduced an open-source AI agent that learns how to navigate websites by mimicking childhood learning – an approach that could lead to more efficient, transparent, and privacy-conscious AI: hai.stanford.edu/news/an-open...
@chrmanning.bsky.social @shikharmurty.bsky.social
Teaching for the first time: I finished the last official lecture of the class and got a surprise round of applause! Feels so great!
🎙️ Speech recognition is great - if you speak the right language.
Our new @stanfordnlp.bsky.social paper introduces CTC-DRO, a training method that reduces worst-language errors by up to 47.1%.
Work w/ Ananjan, Moussa, @jurafsky.bsky.social, Tatsu Hashimoto and Karen Livescu.
Here’s how it works 🧵
Our survey highlights the enduring influence of linguistics on #NLProc. We emphasize 6 facets: Resources, Evaluation, Low-resource settings, Interpretability, Explanation, and the Study of language.
Check it out for cool plots like this about how affinities between words in sentences and how they can show how Green Day isn't like green paint or green tea. And congrats to @coryshain.bsky.social and the CLiMB lab! climblab.org
🚨 First preprint from the lab! 🚨 Josh Rozner (w/@weissweiler.bsky.social and @kmahowald.bsky.social) uses counterfactual experiments on LMs to show that word distributions can provide a learning signal for diverse syntactic constructions, including some hard cases.
I am concerned about AI but late at night, alone working on a proposal, I was glad ChatGPT had my back as I hit submit 😀.. Reminded me of @chrmanning.bsky.social’s mention in a talk of the 'Real World Utility Test' - early adoption of tech moves forward when it’s genuinely useful, concerns and all.
Upcoming joint LCSR seminar featuring @stanfordnlp.bsky.social’s @siddkaramcheti.bsky.social! Learn more about it here: www.cs.jhu.edu/event/cs-lcs...
An introductory talk by @chrmanning.bsky.social on “Large Language Models in 2025 – How much understanding and intelligence?” at the Workshop on a Public AI Assistant to Worldwide Knowledge at Stanford, covering 3 eras of LLMs, RAG, Agents, DeepSeek-R1, using LLMs, ….
Video: youtu.be/5Aer7MUSuSU
How can we better think and talk about human-like qualities attributed to language technologies like LLMs? In our #CHI2025 paper, we taxonomize how text outputs from cases of user interactions with language technologies can contribute to anthropomorphism. arxiv.org/abs/2502.09870 1/n
We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩⚖️ ?
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
EgoNormia (egonormia.org) exposes a major gap in Vision-Language Models understanding of the social world: they don't know how to behave when norms about the physical world *conflict* ⚔️ (<45% acc.)
But humans are naturally quite good at this (>90% acc.)
Check it out!
➡️ arxiv.org/abs/2502.20490
1/13 New Paper!! We try to understand why some LMs self-improve their reasoning while others hit a wall. The key? Cognitive behaviors! Read our paper on how the right cognitive behaviors can make all the difference in a model's ability to improve with RL! 🧵
In 2013, at AKBC 2013 and other workshops, I gave a talk titled “Texts are Knowledge”. This was well before there were any transformer LLMs—indeed before the invention of attention—and my early neural NLP ideas were rudimentary.
🔮 Nevertheless, the talk was quite prophetic!
🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs!
We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.
Excited to have two papers at #NAACL2025!
The first reveals how human over-reliance can be exacerbated by LLM friendliness. The second presents a novel computational method for concept tracing. Check them out!
arxiv.org/pdf/2407.07950
arxiv.org/pdf/2502.05704
Real-world AI needs real-world work. Let’s make it happen 🔥🔥
Want to learn more?
Paper: arxiv.org/pdf/2410.03017v2
Code: github.com/rosewang2008/tutor-copilot
School visit: www.youtube.com/watch?v=IOd2...
Thank you @nssaccelerator.bsky.social @stanfordnlp.bsky.social for the support!
AI won’t reshape education without tackling real problems. Why are we not visiting schools or talking to teachers?
A year ago, I partnered with a district facing a major challenge. Instead of doing AI x Education research in isolation, I focused on their real needs.🧵
We’ve been thrilled by the positive reception to Gemini 2.0 Flash Thinking we discussed in December.
Today we’re sharing an experimental update w/improved performance on math, science, and multimodal reasoning benchmarks 📈:
• AIME: 73.3%
• GPQA: 74.2%
• MMMU: 75.4%
LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? 🤖➕👤
Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(🧵 with video)
The lawyers are really bringing it this morning at the IMLS National Forum on Data Speculations and I love it. "There is no principled difference between the fair use argument for text and data mining and the fair use argument for AI. If AI is theft, so is your scholarship." 🔥
postdoc opportunity in @alexwoolgar.bsky.social and my lab, based in Cambridge UK! seeking someone with excellent analytical skills to join our project using time-resolved human neuroimaging to study receptive language processing in non-speaking autistic individuals 🧠✨
www.jobs.cam.ac.uk/job/48835/
"Mission: Impossible" was featured in Quanta Magazine! Big thank you to @benbenbrubaker.bsky.social for the wonderful article covering our work on impossible languages. Ben was so thoughtful and thorough in all our conversations, and it really shows in his writing!
We are excited to announce the 2nd ELLIS Winter School on Foundation Models in 2025, 18-21 March in Amsterdam. Secure your spot! ✨
🔗 Visit ivi.fnwi.uva.nl/ellis/events...
🚀 Apply forms.gle/bYbZi9J7NzCb...
#AI #ML #foundationmodels #ELLISforEurope #ELLISunitAmsterdam