Data diversity in NLP has been receiving quite some attention... but how should we actually measure it?
Our paper reflects on conceptual and methodological challenges and explores what we can learn from other disciplines to develop better data diversity measures.
04.11.2025 15:43 β π 0 π 0 π¬ 0 π 0
How do language models memorize noise while reason impressively well?
Our #EMNLP2025 (poster, Nov 5, 11:00-12:30, Hall C) paper shows that memorization reuses internal mechanisms of generalization, even when they are not related to each other!
arxiv.org/abs/2507.04782
01.11.2025 17:24 β π 4 π 2 π¬ 1 π 0
Congrats Anna!! π
24.10.2025 06:43 β π 1 π 0 π¬ 0 π 0
Our CDT is based in the Edinburgh Futures Institute β the University of Edinburghβs brand new hub for research, innovation and teaching focused on socially just artificial intelligence and data.
Please share!
We have a number of fully funded PhD studentships in "Designing Responsible Natural Language Processing". I'm a possible supervisor & I'd be keen to support projects on sociolinguistics-AI, e.g., accent bias in AI, language+gender/sexuality+AI.
www.responsiblenlp.org
10.10.2025 15:03 β π 21 π 22 π¬ 0 π 0
Speech and Language Processing
Speech and Language Processing
Now that school is starting for lots of folks, it's time for a new release of Speech and Language Processing! Jim and I added all sorts of material for the August 2025 release! With slides to match! Check it out here: web.stanford.edu/~jurafsky/sl...
24.08.2025 19:28 β π 149 π 59 π¬ 3 π 5
together with some Utrecht NLP people at ACL 2025! #acl2025 #acl2025NLP
27.07.2025 19:48 β π 7 π 1 π¬ 0 π 0
Wanna do some authorship attribution? Chances are what tokenizer you use matters.
Tokenization is Sensitive to Language Variation, probably, more investigation necessary...
π ACL Findings paper: arxiv.org/pdf/2502.15343
π§βπ« @dongng.bsky.social @davidjurgens.bsky.social and myself
See you at ACL!
17.07.2025 07:59 β π 14 π 4 π¬ 0 π 1
The worst happened. We were DOGEβd. Our NSF funding is gone.
So now thereβs nothing stopping me from sharing Expert Voices Together, a crisis response system for US-based researchers and journalists facing harassment.
It's a true passion project. π§΅ 1/
expertvoicestogether.org
13.05.2025 16:22 β π 1696 π 779 π¬ 22 π 25
list of banned keywords
π¨BREAKING. From a program officer at the National Science Foundation, a list of keywords that can cause a grant to be pulled. I will be sharing screenshots of these keywords along with a decision tree. Please share widely. This is a crisis for academic freedom & science.
04.02.2025 01:26 β π 27909 π 15809 π¬ 1279 π 3684
CMU LTI Language Technology for All Internship 2025 - Language Technologies Institute - School of Computer Science - Carnegie Mellon University
The LTI is currently seeking applicants for the summer 2025 Language Technology for All Internship
Are you a pre-doctoral student interested in language technologies, especially focusing on safe, fair and inclusive AI? Our Summer 2025 Language Technology for All Internship could be a great fit. See the link below for more info, and to apply:
lti.cs.cmu.edu/news-and-eve...
06.01.2025 21:24 β π 16 π 13 π¬ 2 π 0
Leveraging Measurement Theory for Natural Language Processing Research
Congratulations to dr. Qixiang Fang for successfully defending his impressive thesis on "Leveraging Measurement Theory for Natural Language Processing Research" -- the first PhD student I advised from start to finish. It was an honor to be part of the journey. research-portal.uu.nl/en/publicati...
06.12.2024 14:44 β π 3 π 0 π¬ 1 π 0
The NLP group in the CS department @ UU was founded in 2018. Our main research themes include:
- NLP and Society (led by Dong Nguyen)
- NLG & Vision and Language (led by Albert Gatt)
- Linguistic meaning variation (led by Massimo Poesio)
The AI ββHelpdesk is a platform where everyone can get scientifically reliable and clear answers to questions about Artificial Intelligence (AI).
Post-doctoral Researcher at BIFOLD / TU Berlin interested in interpretability and analysis of language models. Guest researcher at DFKI Berlin. https://nfelnlp.github.io/
assistant prof @utrechtuniversity.bsky.social ⬦ natural logic, natural language reasoning, semantics in #NLProc ⬦ LangPro, Parallel Meaning Bank, NALOMA ⬦ π¬πͺ
NLP / CSS PhD at Berkeley I School. I develop computational methods to study culture as a social language.
PhD Student in Social Data Science at University of Mannheim | LLMs and Surveys | georgahnert.de
PhD in Computational Linguistics at Bielefeld University, interested in Lexical Semantic Variation & Hate Speech
PhD candidate @ Institute for Logic, Language and Computation, University of Amsterdam https://www.ivoverhoeven.nl
Professor of Media, Gender and Postcolonial Studies at Utrecht University, The Netherlands. NWO project: VR as Empathy Machine: Media, Migration and the Humanitarian Predicament. Latest book: Doing Digital Migration Studies: https://tinyurl.com/5257448k
Co-CEO, Yutori. Join the waitlist at yutori.com
Research Scientist at GDM. Statistician. Mostly work on Responsible AI. Academia-industry flip-flopper.
Asst Prof at Cornell Info Sci and Cornell Tech. Responsible AI
https://angelina-wang.github.io/
Assistant professor of CS at UC Berkeley, core faculty in Computational Precision Health. Developing ML methods to study health and inequality. "On the whole, though, I take the side of amazement."
https://people.eecs.berkeley.edu/~emmapierson/
Prof (CS @Stanford), Co-Director @StanfordHAI, Cofounder/CEO @theworldlabs, CoFounder @ai4allorg #AI #computervision #robotics #AI-healthcare
assistant professor, @ltiatcmu.bsky.social. machine learning: LLMs and climate. π³οΈβππ³οΈββ§οΈ they/them/dad (2 dogs).
pro-AI, anti-capitalist, anti-fascist.
Website: strubell.github.io
Speech β’ Language β’ Learning
https://grzegorz.chrupala.me
@ Tilburg University
The Milan Natural Language Processing Group #NLProc #AI
milanlproc.github.io
linguistics x artificial intelligence x cognitive science | computational linguistics, NLP | COLT Research Group @colt-upf.bsky.social, ICREA @icreacommunity.bsky.social, Universitat Pompeu Fabra @upf.edu, @traduccioupf.bsky.social
gboleda.github.io
Postdoc researcher for HUMANads at Utrecht University. Computational social sciences + NLP + creator economy.
he/any
π§π·
Researcher, Microsoft Research; Faculty, Information Science, Cornell University