I'm lecturing about the "History of NLP" this week. What should I include? Any favorite anecdotes, images, people, methods? Slides, books, papers, or talks for inspiration or grounding?
I've been maintaining a small collection here: www.are.na/maria-antoni...
Good post on how to think about honing your skills as an (academic) researcher by Carlini
nicholas.carlini.com/writing/2026...
You can find similar but more interesting experiments in Vauhini Vara's recent New Yorker piece, and/or @tuhinchakr.bsky.social's work, and lots of other places!
www.newyorker.com/culture/the-...
arxiv.org/abs/2601.18353
Only 0.1% of academic papers published since 2023 have explicitly disclosed the use of AI for writing assistance, yet textual analysis suggests that the actual rate of AI use is 40 times higher. The study’s authors call for a policy rethink. In PNAS: https://ow.ly/HPb250YrlyB
I agree that it’s silly to claim that LLMs Can’t Do Anything (obviously they can do many things). I also think it’s silly to claim LLMs Can Do Everything (obviously they can’t).
regardless of how one feels about that, this is a very scary time to need a job and people are reacting accordingly
“Faculty on CU Denver and Boulder campuses say the decision was reached without consulting campus experts in AI, ethics or education.”
When you collect data online, are the results from humans or AI? In a project led by Booth PhD student Grace Zhang, we estimate the prevalence of AI agents on commonly used survey platforms:
osf.io/preprints/ps...
🧵
When shaping your research agenda, your objective is to find the weirdest niche possible that still has the potential to change everything.
🚨 NLP4DH 2026 deadline has been extended to March 13! Submission link here: openreview.net/group?id=NLP...
Writing an HCI paper about an AI-powered system to a venue like UIST 2026 or CHI 2027? Wondering what reviewers expect you to report, and how to approach paper framing and writing? Check out our reporting guidelines: medium.com/p/7c3ae86341...
Without such eval, rushed integration of AI into classrooms may exacerbate existing academic achievement gaps.
See our paper for more (inc. a study where I redrew 300+ images by hand): arxiv.org/abs/2603.00925
@ai2.bsky.social @kylelo.bsky.social
We argue that eval around AI for education should be disaggregated in a manner that pinpoints whether models can discern when a student may need pedagogical support, and whether models equitably serve students across different levels of proficiency.
Models’ mistakes may assume correct math solutions. Typically, models are trained on “high quality” math so that they can hill-climb on GSM8k, MATH, etc. However, dev pipelines that favor correct math are tension w/ education, where math errors require extra attention.
We find that this gap is primarily driven by QA related to content description. In addition, VLMs struggle to identify cases when help is needed; the most challenging QA are those related to assessing students’ correctness and errors.
Models are now expert math solvers, and so AI for math education is receiving increasing attention.
Our new preprint evaluates 11 VLMs on our QA benchmark, DrawEduMath. We highlight a startling gap: models perform less well on inputs from K-12 students who need more help. 🧵
1/7 🧵 The GPT-4 technical report featured detailed calibration curves.
Since then, not a single major model release has reported calibration. The field quietly stopped measuring whether models know what they don't know.
Our new position paper argues this is a mistake. Here's why.
Abstract submissions close on March 3rd!
We are also extending a ✨ call for mentored reviewers ✨ if you advise excellent graduate or postdoctoral researchers you are welcome to recommend them to review for IC2S2 2026. Email IC2S2@uvm.edu to nominate mentored reviewers (or faculty colleagues)
CORRECTION, Claude Code launched in February 2025, suggesting a roughly 13% increase above expectations.
I remember the time to time muttering!! 😮 Curious, chinese-speaking culture in mainland china or US or elsewhere??
Agents of Chaos -- what are autonomous OpenClaw agents up to? How do they interact with each other? Read our investigation of OpenClaw at
researchgate.net/publication/...
And an interactive website agentsofchaos.baulab.info
@davidbau.bsky.social @natalieshapira.bsky.social @openclaw-x.bsky.social
I'm hiring a postdoc at @cmu.edu (w/ far.ai & @dgrand.bsky.social + @gordpennycook.bsky.social)!
How do LLMs shape human beliefs — and what do we do about it? AI safety meets behavioral science.
Open to technical and social science backgrounds.
New research: The AI Fluency Index.
We tracked 11 behaviors across thousands of http://Claude.ai conversations—for example, how often people iterate and refine their work with Claude—to measure how well people collaborate with AI.
Read more: https://www.anthropic.com/research/AI-fluency-index
We've alllllmost gotten all the Jan26 ARR reviews in, but I'm still trying to track down new emergency reviewers for papers on the following topics:
1) agents
2) jailbreaking
3) coding
4) RL
5) reasoning
6) LLM for finance
7) AMR
8) alignment
If you can review any (in next 24-48h) please DM me 🙏🙏🙏
I was taught that to have a great job talk narrative, you really only need ~3 high quality papers
How horrible to be a CS grad student under pressure to submit multiple first author papers to every conference deadline, whether they feel ready or not. This serves no one’s best interests in long run (science included). But lots of students appear to being getting advice it’s necessary to compete
“Humans across multiple languages spontaneously associate the nonwords kiki & bouba with spiky & round shapes, respectively...We tested the bouba-kiki effect in baby chickens. Similar to humans, they spontaneously chose a spiky shape when hearing a kiki sound & a round shape when hearing a bouba.”😲🧪
I have a small project that is taking me outside of academia to dip into industry, just ever so briefly.
I engage a lot with AI. I was not at all prepared for how industry is using it. Not. at. all.
This brief little window is definitely helping me better frame my teaching in this new world.
My contribution to the discourse, which I've said before and will say again: DH isn't over. DH has won. 1/
Postdoc positions at UC Berkeley, including with the fabulous Cultural Analytics group: aprecruit.berkeley.edu/JPF05222
I asked Gemini to "defend itself," and say what the big benefits of LLMs have been since 2020:
"Since 2020, the volume of digital noise has increased, and LLMs have provided the first reliable shield against it."