Dominik Klement's Avatar

Dominik Klement

@dklement.bsky.social

Speech Researcher @ BUT SPEECH Visiting student @ CLSP Johns Hopkins University GitHub: https://github.com/domklement LinkedIN: https://www.linkedin.com/in/dominik-klement/

53 Followers  |  118 Following  |  14 Posts  |  Joined: 30.11.2024  |  1.8161

Latest posts by dklement.bsky.social on Bluesky


Our papers to be presented at ICASSP in Hyderabad!

Target Speaker ASR with Whisper, ieeexplore.ieee.org/document/108...
Introduces a novel approach to training target-speaker ASR systems utilizing frame-level diarization outputs.
Apr 11: 2:00 pm - 3:30 pm, Poster 2E, presented by Alexander Polok

02.04.2025 13:20 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ—£οΈ Are you participating in the Interspeech 2025 Workshop on Multilingual Conversational Speech Language Models organised by Nexdata【旧Datatangζ ͺεΌδΌšη€Ύε…¬εΌγ€‘?

We’ve released our baseline model for the communityβ€”ready for you to explore and build upon!
πŸ”— Try it here: pccnect.fit.vutbr.cz/gradio-demo/
[1/4]

24.03.2025 20:00 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Speechers don’t do just math, code, experiments, papers and research proposals - they also skate, or at least try to skate! 1 hour on rented skate-rink was enough to test the endurance of pros as well as beginners. Of course, followed by β€œone” in Microbrewery Lisen β›ΈοΈπŸΊ

02.02.2025 11:16 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

🀝 Collaboration and Feedback Welcome
We’re open to feedback, discussions, and collaborations. Let’s work together to shape the future of ASR and diarization technology!

[14/14]

11.01.2025 19:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

🌟 Kudos to CHiME-8 NOTSOFAR-1 Organizers
Thanks to Alon Vinnikov, Amir Ivry, Eyal Krupka (Microsoft) for organizing the CHiME-8 NOTSOFAR-1 Challenge, and to the CHiME-8 Steering Committee for their dedication to advancing speech recognition research!

[13/14]

11.01.2025 19:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Gradio

πŸ’»Gradio-powered Demo pccnect.fit.vutbr.cz/gradio-demo - Test our DiCoW model to transcribe your own meetings! The demo is live for 72 hours only, so don’t miss this chance.
[12/14]

11.01.2025 19:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - BUTSpeechFIT/DiCoW Contribute to BUTSpeechFIT/DiCoW development by creating an account on GitHub.

πŸ”—DiCoW Inference Demo Pipeline github.com/BUTSpeechFIT...
[11/14]

11.01.2025 19:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
GitHub - BUTSpeechFIT/TS-ASR-Whisper Contribute to BUTSpeechFIT/TS-ASR-Whisper development by creating an account on GitHub.

πŸ”—Target-Speaker Whisper Source Code github.com/BUTSpeechFIT...
[10/14]

11.01.2025 19:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - BUTSpeechFIT/DiariZen: A toolkit for speaker diarization. A toolkit for speaker diarization. . Contribute to BUTSpeechFIT/DiariZen development by creating an account on GitHub.

🌟Open-Source Tools and Demos
We’re making our research accessible by open-sourcing training and inference codebases, and providing interactive demos:
πŸ”—DiariZen Source Code github.com/BUTSpeechFIT...

[9/14]

11.01.2025 19:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Leveraging Self-Supervised Learning for Speaker Diarization End-to-end neural diarization has evolved considerably over the past few years, but data scarcity is still a major obstacle for further improvements. Self-supervised learning methods such as WavLM hav...

🌟
4. Leveraging Self-Supervised Learning for Speaker Diarization - Accepted to ICASSP 2025. This paper introduces DiariZen - our state-of-the-art diarization model and toolkit.
arxiv.org/abs/2409.09408
[8/14]

11.01.2025 19:30 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
ISCA Archive - BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge

🌟
3. BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge - The work earned the πŸ†Jury Prize for being one of the most practical, efficient, and novel systems. Our robust diarization-ASR integration is capable of tackling overlapped speech.
isca-archive.org/chime_2024/p...

[7/14]

11.01.2025 19:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Target Speaker ASR with Whisper We propose a novel approach to enable the use of large, single speaker ASR models, such as Whisper, for target speaker ASR. The key insight of this method is that it is much easier to model relative d...

🌟
2. Target Speaker ASR with Whisper arxiv.org/abs/2409.09543 - Accepted to ICASSP 2025. This work enhances the Whisper ASR model for target-speaker recognition, demonstrating its applicability in complex acoustic scenarios.
[6/14]

11.01.2025 19:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

By directly conditioning the ASR model on diarization outputs, we simplify the workflow for multi-speaker and target-speaker scenarios. Importantly, DiCoW maintains Whisper’s performance on single-speaker transcription, ensuring robustness across diverse use cases.
[5/14]

11.01.2025 19:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition Speaker-attributed automatic speech recognition (ASR) in multi-speaker environments remains a significant challenge, particularly when systems conditioned on speaker embeddings fail to generalize to u...

🌟Recent Papers
1. DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition - Submitted to CSL. Our diarization-conditioned approach that eliminates the need for speaker enrollment or source separation.
arxiv.org/abs/2501.00114
[4/14]

11.01.2025 19:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

- Versatile and Robust: Despite all improvements, our systems retain high performance on single-speaker transcription tasks, ensuring broad applicability across use cases.
[3/14]

11.01.2025 19:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

🌟Key Innovations
- Simplifying Multi-Speaker ASR: Our models directly use diarization outputs as conditioning signals, bypassing the need for enrollment data or complex source separation techniques.

[2/14]

11.01.2025 19:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Scheme of DiCoW target speaker ASR pipeline

Scheme of DiCoW target speaker ASR pipeline

Transcribing multiple speakers with OpenAI’s Whisper? No problem.

Check out our recent work at BUT Speech@FIT in collaboration with CLSP JHU. It is fully open-sourced. Do not forget to try out our demo: pccnect.fit.vutbr.cz/gradio-demo

Read more in this thread πŸ‘‡

[1/14]

11.01.2025 19:30 β€” πŸ‘ 13    πŸ” 3    πŸ’¬ 2    πŸ“Œ 0

@dklement is following 20 prominent accounts