arXiv eess.AS Audio and Speech Processing's Avatar

arXiv eess.AS Audio and Speech Processing

@eessas-bot.bsky.social

Unofficial bot by @vele.bsky.social w/ http://github.com/so-okada/bXiv https://arxiv.org/list/eess.AS/new List https://bsky.app/profile/vele.bsky.social/lists/3lim7ccweqo2j ModList https://bsky.app/profile/vele.bsky.social/lists/3lim3qnexsw2g

20 Followers  |  2 Following  |  2,658 Posts  |  Joined: 16.02.2025  |  1.5853

Latest posts by eessas-bot.bsky.social on Bluesky

Grossman, Park, Dhawan, Titus, Zhi, Shchadilova, Wang, Balam, Ginsburg: SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription https://arxiv.org/abs/2508.05554 https://arxiv.org/pdf/2508.05554 https://arxiv.org/html/2508.05554

08.08.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Sam Kouteili, Hiren Madhu, George Typaldos, Mark Santolucito: Embedding Alignment in Code Generation for Audio https://arxiv.org/abs/2508.05473 https://arxiv.org/pdf/2508.05473 https://arxiv.org/html/2508.05473

08.08.2025 06:33 β€” πŸ‘ 0    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Wahida, Chamikara, Shanmugarasa, Chhetri, Ranbaduge, Khalil: From Detection to Correction: Backdoor-Resilient Face Recognition via Vision-Language Trigger Detection and Noise-Based Neutraliza... https://arxiv.org/abs/2508.05409 https://arxiv.org/pdf/2508.05409 https://arxiv.org/html/2508.05409

08.08.2025 06:31 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Runchuan Ye, Yixuan Zhou, Renjie Yu, Zijian Lin, Kehan Li, Xiang Li, Xin Liu, Guoyang Zeng, Zhiyong Wu: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding https://arxiv.org/abs/2508.05385 https://arxiv.org/pdf/2508.05385 https://arxiv.org/html/2508.05385

08.08.2025 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Mathias Rose Bjare, Stefan Lattner, Gerhard Widmer: Estimating Musical Surprisal from Audio in Autoregressive Diffusion Model Noise Spaces https://arxiv.org/abs/2508.05306 https://arxiv.org/pdf/2508.05306 https://arxiv.org/html/2508.05306

08.08.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Yunpeng Li, Kehang Han, Brian McWilliams, Zalan Borsos, Marco Tagliasacchi: SpectroStream: A Versatile Neural Codec for General Audio https://arxiv.org/abs/2508.05207 https://arxiv.org/pdf/2508.05207 https://arxiv.org/html/2508.05207

08.08.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Du, Li, Zhang, Qiao, Yu, Zhen, Jia, Yang, Yin, Liu: RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer https://arxiv.org/abs/2508.05115 https://arxiv.org/pdf/2508.05115 https://arxiv.org/html/2508.05115

08.08.2025 06:31 β€” πŸ‘ 0    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Zhang, Tan, Li, Zhang, Chen, Lei, Yang, Wu, Wang, Huang, Yu: Towards Hallucination-Free Music: A Reinforcement Learning Preference Optimization Framework for Reliable Song Generation https://arxiv.org/abs/2508.05011 https://arxiv.org/pdf/2508.05011 https://arxiv.org/html/2508.05011

08.08.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Nameer Hirschkind, Joseph Liu, Mahesh Kumar Nandwana, Xiao Yu: REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation https://arxiv.org/abs/2508.04946 https://arxiv.org/pdf/2508.04946 https://arxiv.org/html/2508.04946

08.08.2025 06:33 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

David Sasu, Natalie Schluter: Pitch Accent Detection improves Pretrained Automatic Speech Recognition https://arxiv.org/abs/2508.04814 https://arxiv.org/pdf/2508.04814 https://arxiv.org/html/2508.04814

08.08.2025 06:29 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Thomas Thebaud, Yen-Ju Lu, Matthew Wiesner, Peter Viechnicki, Najim Dehak: Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM https://arxiv.org/abs/2508.04795 https://arxiv.org/pdf/2508.04795 https://arxiv.org/html/2508.04795

08.08.2025 06:29 β€” πŸ‘ 0    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Zhao, Yi, Zhou, Pan, Wang, Xia, Li, Dong, Pan: Wearable Music2Emotion : Assessing Emotions Induced by AI-Generated Music through Portable EEG-fNIRS Fusion https://arxiv.org/abs/2508.04723 https://arxiv.org/pdf/2508.04723 https://arxiv.org/html/2508.04723

08.08.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Vignesh Ethiraj, Ashwath David, Sidhanth Menon, Divya Vijay: Toward Low-Latency End-to-End Voice Agents for Telecommunications Using Streaming ASR, Quantized LLMs, and Real-Time TTS https://arxiv.org/abs/2508.04721 https://arxiv.org/pdf/2508.04721 https://arxiv.org/html/2508.04721

08.08.2025 06:34 β€” πŸ‘ 1    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Jiatong Li, Simon Doclo: Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement https://arxiv.org/abs/2508.05293 https://arxiv.org/pdf/2508.05293 https://arxiv.org/html/2508.05293

08.08.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Tom B\"ackstr\"om, Mohammad Hassan Vali, My Nguyen, Silas Rech: Privacy Disclosure of Similarity in Speech and Language Processing https://arxiv.org/abs/2508.05250 https://arxiv.org/pdf/2508.05250 https://arxiv.org/html/2508.05250

08.08.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Seraphina Fong, Marco Matassoni, Alessio Brutti: Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages https://arxiv.org/abs/2508.05149 https://arxiv.org/pdf/2508.05149 https://arxiv.org/html/2508.05149

08.08.2025 06:35 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Anuprabha M, Krishna Gurugubelli, Anil Kumar Vuppala: Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS https://arxiv.org/abs/2508.05102 https://arxiv.org/pdf/2508.05102 https://arxiv.org/html/2508.05102

08.08.2025 06:35 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Naoyuki Kamo, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani: MOVER: Combining Multiple Meeting Recognition Systems https://arxiv.org/abs/2508.05055 https://arxiv.org/pdf/2508.05055 https://arxiv.org/html/2508.05055

08.08.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Jiang, Ning, Wang, Wang, Bi, Zhu, Xie, Fu: REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers https://arxiv.org/abs/2508.04996 https://arxiv.org/pdf/2508.04996 https://arxiv.org/html/2508.04996

08.08.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Henri Gode, Simon Doclo: Closed-Form Successive Relative Transfer Function Vector Estimation based on Blind Oblique Projection Incorporating Noise Whitening https://arxiv.org/abs/2508.04887 https://arxiv.org/pdf/2508.04887 https://arxiv.org/html/2508.04887

08.08.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yael Segal-Feldman, Ann R. Bradlow, Matthew Goldrick, Joseph Keshet: Keyword Spotting with Hyper-Matched Filters for Small Footprint Devices https://arxiv.org/abs/2508.04857 https://arxiv.org/pdf/2508.04857 https://arxiv.org/html/2508.04857

08.08.2025 06:35 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

[2025-08-08 Fri (UTC), 8 new articles found for eessAS Audio and Speech Processing]

08.08.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Bart van Merri\"enboer, Vincent Dumoulin, Jenny Hamer, Lauren Harrell, Andrea Burns, Tom Denton: Perch 2.0: The Bittern Lesson for Bioacoustics https://arxiv.org/abs/2508.04665 https://arxiv.org/pdf/2508.04665 https://arxiv.org/html/2508.04665

07.08.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Praveen Srinivasa Varadhan, Sherry Thomas, Sai Teja M. S., Suvrat Bhooshan, Mitesh M. Khapra: The State Of TTS: A Case Study with Human Fooling Rates https://arxiv.org/abs/2508.04179 https://arxiv.org/pdf/2508.04179 https://arxiv.org/html/2508.04179

07.08.2025 06:30 β€” πŸ‘ 0    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Bingshen Mu, Yiwen Shao, Kun Wei, Dong Yu, Lei Xie: Efficient Scaling for LLM-based ASR https://arxiv.org/abs/2508.04096 https://arxiv.org/pdf/2508.04096 https://arxiv.org/html/2508.04096

07.08.2025 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Dinkel, Li, Liu, Luan, Niu, Sun, Wang, Xiao, Zhang, Zhou: MiDashengLM: Efficient Audio Understanding with General Audio Captions https://arxiv.org/abs/2508.03983 https://arxiv.org/pdf/2508.03983 https://arxiv.org/html/2508.03983

07.08.2025 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Katharina Hoedt, Arthur Flexer, Gerhard Widmer: Are Inherently Interpretable Models More Robust? A Study In Music Emotion Recognition https://arxiv.org/abs/2508.03780 https://arxiv.org/pdf/2508.03780 https://arxiv.org/html/2508.03780

07.08.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Yifan Hu, Rui Liu, Yi Ren, Xiang Yin, Haizhou Li: UniTalker: Conversational Speech-Visual Synthesis https://arxiv.org/abs/2508.04585 https://arxiv.org/pdf/2508.04585 https://arxiv.org/html/2508.04585

07.08.2025 06:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Franziska Braun, Christopher Witzl, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer: Pitfalls and Limits in Automatic Dementia Assessment https://arxiv.org/abs/2508.04512 https://arxiv.org/pdf/2508.04512 https://arxiv.org/html/2508.04512

07.08.2025 06:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yash Bhake, Ankit Anand, Preeti Rao: Melodic and Metrical Elements of Expressiveness in Hindustani Vocal Music https://arxiv.org/abs/2508.04430 https://arxiv.org/pdf/2508.04430 https://arxiv.org/html/2508.04430

07.08.2025 06:35 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

@eessas-bot is following 2 prominent accounts