β¨Meet OLMoASRβ¨ By pairing our curated 1M-hour dataset with a powerful architecture, we've built open ASR models that achieve competitive performance with models like Whisper. We're open-sourcing data, code and models to help the community build more robust and transparent ASR.
29.08.2025 16:21 β π 12 π 1 π¬ 0 π 0
Speech and Language Processing
Speech and Language Processing
Now that school is starting for lots of folks, it's time for a new release of Speech and Language Processing! Jim and I added all sorts of material for the August 2025 release! With slides to match! Check it out here: web.stanford.edu/~jurafsky/sl...
24.08.2025 19:28 β π 151 π 59 π¬ 2 π 5
Big THANK YOU to the amazing #Interspeech2025 Organizing Committee! π
π€ Odette Scharenborg, Catharine Oertel, Khiet Truong
π° Martijn Bartelds
π DragoΘ BΔlan
ποΈ Saskia Peters
π€ Ginny Ruiter, Marie Louise Verhagen, Natascha Voskuijl
14.07.2025 14:26 β π 10 π 3 π¬ 1 π 0
Congratulations!! Thatβs wonderful!! ππΎ
02.07.2025 17:18 β π 1 π 0 π¬ 0 π 0
Congrats!!! π
29.04.2025 22:46 β π 1 π 0 π¬ 0 π 0
CTC-DRO can be applied to ASR with minimal computational costs, and offers the potential for reducing group disparities in other domains with similar challenges.
π Read our paper: arxiv.org/pdf/2502.017...
π» Get the code: github.com/Bartelds/ctc...
12.03.2025 15:29 β π 0 π 0 π¬ 0 π 0
The result:
π Worst-language error β up to 47.1%
π Average error β up to 32.9%
CTC-DRO works seamlessly with existing self-supervised speech models through ESPnet π
12.03.2025 15:29 β π 0 π 0 π¬ 1 π 0
We present CTC-DRO, which addresses the shortcomings of the group DRO objective by:
β
Input length-matched batching to mitigate CTCβs scaling issues
β
Smoothing the group weight update to prevent overemphasis on consistently high-loss groups
12.03.2025 15:29 β π 0 π 0 π¬ 1 π 0
Why? Group DRO needs comparable training losses between languages. But in ASR, CTC-based losses vary due to differences in speech length, speakers, and acoustics. This creates spurious differences across language groups.
Result? Worse performance.
We need a new approach π
12.03.2025 15:29 β π 0 π 0 π¬ 1 π 0
CTC-based fine-tuning has been successful in multilingual ASR benchmarks but it doesn't fix language performance gaps. Group DRO could help by focusing on worst-performing languages, but it does not work β
12.03.2025 15:29 β π 1 π 0 π¬ 1 π 0
ποΈ Speech recognition is great - if you speak the right language.
Our new @stanfordnlp.bsky.social paper introduces CTC-DRO, a training method that reduces worst-language errors by up to 47.1%.
Work w/ Ananjan, Moussa, @jurafsky.bsky.social, Tatsu Hashimoto and Karen Livescu.
Hereβs how it works π§΅
12.03.2025 15:29 β π 11 π 3 π¬ 1 π 0
I am excited to announce that I will join the University of Zurich as an assistant professor in August this year! I am looking for PhD students and postdocs starting from the fall.
My research interests include optimization, federated learning, machine learning, privacy, and unlearning.
06.03.2025 02:17 β π 28 π 5 π¬ 1 π 1
π’ Join us for the Conversational AI Reading Group meeting on Thursday, January 16th, 11 AM-12 PM EST.
Martijn Bartelds will present "Improving Universal Access to Modern Speech Technology".
Details here: poonehmousavi.github.io/rg
13.01.2025 16:19 β π 2 π 3 π¬ 0 π 0
Speech and Language Processing
Speech and Language Processing
Happy New Year everyone! Jim and I just put up our January 2025 release of Speech and Language Processing! Check it out here: web.stanford.edu/~jurafsky/sl...
12.01.2025 20:44 β π 152 π 50 π¬ 1 π 1
Group picture of people in the Stanford NLP Group gathered in front of the shores of Lake Tahoe.
Natural Language Processingβartificial intelligence that uses human languageβhas been on a roll lately. Youβve probably noticed! So the Stanford NLP Group has been growing, and diversifying into lots of new topics, including agents, language model programs, and socially aware #NLP.
nlp.stanford.edu
04.12.2024 17:14 β π 53 π 8 π¬ 1 π 0
Excited to announce the launch of our ML-SUPERB 2.0 challenge @interspeech.bsky.social 2025! Join us in pushing the boundaries of multilingual ASR and LID! π
π» multilingual.superbbenchmark.org
04.12.2024 18:09 β π 8 π 3 π¬ 0 π 0
Multimodal Information Based Speech Processing (MISP) 2025 Challenge
Hi speech people, super exciting news here!
We are running another "Multimodal information based speech (MISP)" Challenge at @interspeech.bsky.social
Participate!
Spread the word!
More info π
mispchallenge.github.io/mispchalleng...
25.11.2024 11:25 β π 15 π 7 π¬ 0 π 0
made this thing, reply to be added
go.bsky.app/AKGJ82V
22.11.2024 00:26 β π 12 π 1 π¬ 6 π 0
πββοΈ
22.11.2024 00:27 β π 1 π 0 π¬ 0 π 0
Mentioning this post from @cjziems.bsky.social, listing some starter packs: bsky.app/profile/cjzi...
20.11.2024 19:02 β π 2 π 0 π¬ 0 π 0
I've started putting together a starter pack with people working on Speech Technology and Speech Science: go.bsky.app/BQ7mbkA
(Self-)nominations welcome!
19.11.2024 11:13 β π 82 π 34 π¬ 44 π 3
πββοΈ
20.11.2024 15:28 β π 1 π 0 π¬ 0 π 0
I wanted to contribute to "Starter Pack Season" with one for Stanford NLP+HCI: go.bsky.app/VZBhuJ5
Here are some other great starter packs:
- CSS: go.bsky.app/GoEyD7d + go.bsky.app/CYmRvcK
- NLP: go.bsky.app/SngwGeS + go.bsky.app/JgneRQk
- HCI: go.bsky.app/p3TLwt
- Women in AI: go.bsky.app/LaGDpqg
15.11.2024 19:20 β π 25 π 10 π¬ 2 π 2
π
17.11.2024 18:30 β π 1 π 0 π¬ 0 π 0
CS PhD Candidate @ Stanford NLP
tolulope.ai
Research Scientist at GDM. Statistician. Mostly work on Responsible AI. Academia-industry flip-flopper.
Welcome to the 26th Interspeech Conference, the premier global event on spoken language processing technology, held in August 17-21, 2025, in Rotterdam, NL.
https://kyutai.org/ Open-Science AI Research Lab based in Paris
Computational Social Scientist (he/him). Fairness & Ethics in NLP/ML/AI. Staff Research Scientist @ Google. Research Affiliate @ Stanford SPARQ.
https://cs.stanford.edu/~vinod/
ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://amzn.to/4fqvn0D) & reasoning (https://mng.bz/Nwr7).
Also blogging about AI research at magazine.sebastianraschka.com.
Princeton computer science prof. I write about the societal impact of AI, tech ethics, & social media platforms. https://www.cs.princeton.edu/~arvindn/
BOOK: AI Snake Oil. https://www.aisnakeoil.com/
Professor at Wharton, studying AI and its implications for education, entrepreneurship, and work. Author of Co-Intelligence.
Book: https://a.co/d/bC2kSj1
Substack: https://www.oneusefulthing.org/
Web: https://mgmt.wharton.upenn.edu/profile/emollick
Technology Policy at Stanford π©πΌβπ» column in FT πͺπΊ Member of European Parliament 2009-2019 πAuthor: The Tech Coup
Working towards the safe development of AI for the benefit of all at UniversitΓ© de MontrΓ©al, LawZero and Mila.
A.M. Turing Award Recipient and most-cited AI researcher.
https://lawzero.org/en
https://yoshuabengio.org/profile/
Working on #NLProc for social good.
Currently at LTI at CMU. π³βπ
Researcher in NLP, ML, computer music. Prof @uwcse @uwnlp & helper @allen_ai @ai2_allennlp & familiar to two cats. Single reeds, tango, swim, run, cocktails, ΧΧΦ·ΧΧ’ΦΎΧΧ©ΧΧ, GenX. Opinions not your business.
Computer Science PhD student at Stanford University
https://cs.stanford.edu/~megha
Postdoc @ Stanford University
https://koloskova.github.io/
Associate Professor at GroNLP ( @gronlp.bsky.social⬠) #NLP | Multilingualism | Interpretability | Language Learning in Humans vs NeuralNets | Mum^2
Head of the InClow research group: https://inclow-lm.github.io/
PhD student @mainlp.bsky.social (@cislmu.bsky.social, LMU Munich). Interested in language variation & change, currently working on NLP for dialects and low-resource languages.
verenablaschke.github.io