Data diversity in NLP has been receiving quite some attention... but how should we actually measure it?
Our paper reflects on conceptual and methodological challenges and explores what we can learn from other disciplines to develop better data diversity measures.
New opinion paper out with Esther Ploeger (Aalborg University): We Need to Measure Data Diversity in NLP — Better and Broader at #EMNLP2025 (main) aclanthology.org/2025.emnlp-m...
How do language models memorize noise while reason impressively well?
Our #EMNLP2025 (poster, Nov 5, 11:00-12:30, Hall C) paper shows that memorization reuses internal mechanisms of generalization, even when they are not related to each other!
arxiv.org/abs/2507.04782
Congrats Anna!! 🎉
Please share!
We have a number of fully funded PhD studentships in "Designing Responsible Natural Language Processing". I'm a possible supervisor & I'd be keen to support projects on sociolinguistics-AI, e.g., accent bias in AI, language+gender/sexuality+AI.
www.responsiblenlp.org
Now that school is starting for lots of folks, it's time for a new release of Speech and Language Processing! Jim and I added all sorts of material for the August 2025 release! With slides to match! Check it out here: web.stanford.edu/~jurafsky/sl...
together with some Utrecht NLP people at ACL 2025! #acl2025 #acl2025NLP
Wanna do some authorship attribution? Chances are what tokenizer you use matters.
Tokenization is Sensitive to Language Variation, probably, more investigation necessary...
📄 ACL Findings paper: arxiv.org/pdf/2502.15343
🧑🏫 @dongng.bsky.social @davidjurgens.bsky.social and myself
See you at ACL!
The worst happened. We were DOGE’d. Our NSF funding is gone.
So now there’s nothing stopping me from sharing Expert Voices Together, a crisis response system for US-based researchers and journalists facing harassment.
It's a true passion project. 🧵 1/
expertvoicestogether.org
I wrote down some thoughts about what sociolinguistics can contribute to LLMs and vice versa, now available dx.doi.org/10.1111/lnc3...
🚨BREAKING. From a program officer at the National Science Foundation, a list of keywords that can cause a grant to be pulled. I will be sharing screenshots of these keywords along with a decision tree. Please share widely. This is a crisis for academic freedom & science.
Are you a pre-doctoral student interested in language technologies, especially focusing on safe, fair and inclusive AI? Our Summer 2025 Language Technology for All Internship could be a great fit. See the link below for more info, and to apply:
lti.cs.cmu.edu/news-and-eve...
Congratulations to dr. Qixiang Fang for successfully defending his impressive thesis on "Leveraging Measurement Theory for Natural Language Processing Research" -- the first PhD student I advised from start to finish. It was an honor to be part of the journey. research-portal.uu.nl/en/publicati...
I'm looking for a PhD student for a new big project on "Data Diversity for Fair and Robust NLP" (📅 Jan 5!) www.uu.nl/en/organisat... #nlproc #nlp