Our papers to be presented at ICASSP in Hyderabad!
Target Speaker ASR with Whisper, ieeexplore.ieee.org/document/108...
Introduces a novel approach to training target-speaker ASR systems utilizing frame-level diarization outputs.
Apr 11: 2:00 pm - 3:30 pm, Poster 2E, presented by Alexander Polok
02.04.2025 13:20 β π 0 π 1 π¬ 1 π 0
π£οΈ Are you participating in the Interspeech 2025 Workshop on Multilingual Conversational Speech Language Models organised by Nexdataγζ§Datatangζ ͺεΌδΌη€Ύε
¬εΌγ?
Weβve released our baseline model for the communityβready for you to explore and build upon!
π Try it here: pccnect.fit.vutbr.cz/gradio-demo/
[1/4]
24.03.2025 20:00 β π 0 π 1 π¬ 1 π 0
Speechers donβt do just math, code, experiments, papers and research proposals - they also skate, or at least try to skate! 1 hour on rented skate-rink was enough to test the endurance of pros as well as beginners. Of course, followed by βoneβ in Microbrewery Lisen βΈοΈπΊ
02.02.2025 11:16 β π 3 π 1 π¬ 0 π 0
π€ Collaboration and Feedback Welcome
Weβre open to feedback, discussions, and collaborations. Letβs work together to shape the future of ASR and diarization technology!
[14/14]
11.01.2025 19:30 β π 0 π 0 π¬ 0 π 0
π Kudos to CHiME-8 NOTSOFAR-1 Organizers
Thanks to Alon Vinnikov, Amir Ivry, Eyal Krupka (Microsoft) for organizing the CHiME-8 NOTSOFAR-1 Challenge, and to the CHiME-8 Steering Committee for their dedication to advancing speech recognition research!
[13/14]
11.01.2025 19:30 β π 0 π 0 π¬ 1 π 0
Gradio
π»Gradio-powered Demo pccnect.fit.vutbr.cz/gradio-demo - Test our DiCoW model to transcribe your own meetings! The demo is live for 72 hours only, so donβt miss this chance.
[12/14]
11.01.2025 19:30 β π 0 π 0 π¬ 1 π 0
GitHub - BUTSpeechFIT/DiariZen: A toolkit for speaker diarization.
A toolkit for speaker diarization. . Contribute to BUTSpeechFIT/DiariZen development by creating an account on GitHub.
πOpen-Source Tools and Demos
Weβre making our research accessible by open-sourcing training and inference codebases, and providing interactive demos:
πDiariZen Source Code github.com/BUTSpeechFIT...
[9/14]
11.01.2025 19:30 β π 1 π 0 π¬ 1 π 0
ISCA Archive - BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge
π
3. BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge - The work earned the πJury Prize for being one of the most practical, efficient, and novel systems. Our robust diarization-ASR integration is capable of tackling overlapped speech.
isca-archive.org/chime_2024/p...
[7/14]
11.01.2025 19:30 β π 0 π 0 π¬ 1 π 0
Target Speaker ASR with Whisper
We propose a novel approach to enable the use of large, single speaker ASR models, such as Whisper, for target speaker ASR. The key insight of this method is that it is much easier to model relative d...
π
2. Target Speaker ASR with Whisper arxiv.org/abs/2409.09543 - Accepted to ICASSP 2025. This work enhances the Whisper ASR model for target-speaker recognition, demonstrating its applicability in complex acoustic scenarios.
[6/14]
11.01.2025 19:30 β π 0 π 0 π¬ 1 π 0
By directly conditioning the ASR model on diarization outputs, we simplify the workflow for multi-speaker and target-speaker scenarios. Importantly, DiCoW maintains Whisperβs performance on single-speaker transcription, ensuring robustness across diverse use cases.
[5/14]
11.01.2025 19:30 β π 0 π 0 π¬ 1 π 0
- Versatile and Robust: Despite all improvements, our systems retain high performance on single-speaker transcription tasks, ensuring broad applicability across use cases.
[3/14]
11.01.2025 19:30 β π 0 π 0 π¬ 1 π 0
πKey Innovations
- Simplifying Multi-Speaker ASR: Our models directly use diarization outputs as conditioning signals, bypassing the need for enrollment data or complex source separation techniques.
[2/14]
11.01.2025 19:30 β π 0 π 0 π¬ 1 π 0
Scheme of DiCoW target speaker ASR pipeline
Transcribing multiple speakers with OpenAIβs Whisper? No problem.
Check out our recent work at BUT Speech@FIT in collaboration with CLSP JHU. It is fully open-sourced. Do not forget to try out our demo: pccnect.fit.vutbr.cz/gradio-demo
Read more in this thread π
[1/14]
11.01.2025 19:30 β π 13 π 3 π¬ 2 π 0
The Language Technologies Institute in Carnegie Mellon University's @scsatcmu.bsky.social
lti.cmu.edu
Entrepreneur
Costplusdrugs.com
Center for Language and Speech Processing at Johns Hopkins University
#NLProc #MachineLearning #AI http://tinyurl.com/clspy2ube
We do impactful research and raise new leading scientific personalities in the field of speech processing.
Speech and language processing researcher & engineer working at the STAR lab at SRI. @PDX
Founder Audiology En EspaΓ±ol | Author Audiology Services in Diverse Communities | Big Sister @BBBSChi | πͺπ¨
Director of AI Research at Apple. Board Chair for Partnership on AI. Photographer. Musician.
PhD Student at UC San Diego | LLM Agents, Reinforcement Learning, Human-AI Collaboration, Multi-Agent Systems
Speech β’ Language β’ Learning
https://grzegorz.chrupala.me
@ Tilburg University
Linguist in AI & CogSci π§ π©βπ»π€ PhD student @ ILLC, University of Amsterdam
π https://mdhk.net/
π https://scholar.social/@mdhk
π¦ https://twitter.com/mariannedhk
Full professor of inclusive speech communication at TU Delft, The Netherlands. Former president of the International Speech Communication Association (ISCA). General Chair of @interspeech.bsky.social Rotterdam, 2025. Mother of 3π
Linguist. Inclusive speech tech. Cats.
interspeech2026.org
27 September β 1 October, ICC, Sydney, Australia
'Speaking Together'
Proudly hosted by the Australasian Speech Science and Technology Association (ASSTA) and the International Speech Communication Association (ISCA).
Professor at UniversitΓ© de Lorraine/Loria/Mines Nancy. Doing research is speech and audio processing.
Assistant professor at Cornell Psychology Department. CoCoCo Lab (Cornell Computational Cognition Lab) @co3lab.bsky.social. I am recruiting!
I work on speech and language technologies at Google. I like languages, history, maps, traveling, cycling, and buying way too many books.
Speech and audio research scientist @MERL. saneworkshop.org co-founder. IguanaTex developer.
π jonathanleroux.org
π github.com/Jonathan-LeRoux/
π scholar.google.com/citations?user=aUpxty8AAAAJ&hl=en
Lecturer in speech and language technology, CSTR, University of Edinburgh.
https://homepages.inf.ed.ac.uk/clai/
ml, audio, cv, nlp, speech, bioacoustics // Assoc. Prof. at UniversitΓ© de Toulon, researcher at LIS CNRS UMR 7020, director of http://www.master-mir.eu in marine robotics and AI