Grossman, Park, Dhawan, Titus, Zhi, Shchadilova, Wang, Balam, Ginsburg: SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription https://arxiv.org/abs/2508.05554 https://arxiv.org/pdf/2508.05554 https://arxiv.org/html/2508.05554
08.08.2025 06:34 β π 0 π 2 π¬ 0 π 0
Sam Kouteili, Hiren Madhu, George Typaldos, Mark Santolucito: Embedding Alignment in Code Generation for Audio https://arxiv.org/abs/2508.05473 https://arxiv.org/pdf/2508.05473 https://arxiv.org/html/2508.05473
08.08.2025 06:33 β π 0 π 3 π¬ 0 π 0
Wahida, Chamikara, Shanmugarasa, Chhetri, Ranbaduge, Khalil: From Detection to Correction: Backdoor-Resilient Face Recognition via Vision-Language Trigger Detection and Noise-Based Neutraliza... https://arxiv.org/abs/2508.05409 https://arxiv.org/pdf/2508.05409 https://arxiv.org/html/2508.05409
08.08.2025 06:31 β π 0 π 2 π¬ 0 π 0
Runchuan Ye, Yixuan Zhou, Renjie Yu, Zijian Lin, Kehan Li, Xiang Li, Xin Liu, Guoyang Zeng, Zhiyong Wu: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding https://arxiv.org/abs/2508.05385 https://arxiv.org/pdf/2508.05385 https://arxiv.org/html/2508.05385
08.08.2025 06:34 β π 0 π 1 π¬ 0 π 0
Mathias Rose Bjare, Stefan Lattner, Gerhard Widmer: Estimating Musical Surprisal from Audio in Autoregressive Diffusion Model Noise Spaces https://arxiv.org/abs/2508.05306 https://arxiv.org/pdf/2508.05306 https://arxiv.org/html/2508.05306
08.08.2025 06:34 β π 0 π 2 π¬ 0 π 0
Yunpeng Li, Kehang Han, Brian McWilliams, Zalan Borsos, Marco Tagliasacchi: SpectroStream: A Versatile Neural Codec for General Audio https://arxiv.org/abs/2508.05207 https://arxiv.org/pdf/2508.05207 https://arxiv.org/html/2508.05207
08.08.2025 06:34 β π 0 π 2 π¬ 0 π 0
Du, Li, Zhang, Qiao, Yu, Zhen, Jia, Yang, Yin, Liu: RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer https://arxiv.org/abs/2508.05115 https://arxiv.org/pdf/2508.05115 https://arxiv.org/html/2508.05115
08.08.2025 06:31 β π 0 π 3 π¬ 0 π 0
Zhang, Tan, Li, Zhang, Chen, Lei, Yang, Wu, Wang, Huang, Yu: Towards Hallucination-Free Music: A Reinforcement Learning Preference Optimization Framework for Reliable Song Generation https://arxiv.org/abs/2508.05011 https://arxiv.org/pdf/2508.05011 https://arxiv.org/html/2508.05011
08.08.2025 06:34 β π 0 π 2 π¬ 0 π 0
Nameer Hirschkind, Joseph Liu, Mahesh Kumar Nandwana, Xiao Yu: REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation https://arxiv.org/abs/2508.04946 https://arxiv.org/pdf/2508.04946 https://arxiv.org/html/2508.04946
08.08.2025 06:33 β π 0 π 2 π¬ 0 π 0
David Sasu, Natalie Schluter: Pitch Accent Detection improves Pretrained Automatic Speech Recognition https://arxiv.org/abs/2508.04814 https://arxiv.org/pdf/2508.04814 https://arxiv.org/html/2508.04814
08.08.2025 06:29 β π 0 π 2 π¬ 0 π 0
Thomas Thebaud, Yen-Ju Lu, Matthew Wiesner, Peter Viechnicki, Najim Dehak: Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM https://arxiv.org/abs/2508.04795 https://arxiv.org/pdf/2508.04795 https://arxiv.org/html/2508.04795
08.08.2025 06:29 β π 0 π 3 π¬ 0 π 0
Zhao, Yi, Zhou, Pan, Wang, Xia, Li, Dong, Pan: Wearable Music2Emotion : Assessing Emotions Induced by AI-Generated Music through Portable EEG-fNIRS Fusion https://arxiv.org/abs/2508.04723 https://arxiv.org/pdf/2508.04723 https://arxiv.org/html/2508.04723
08.08.2025 06:34 β π 0 π 2 π¬ 0 π 0
Vignesh Ethiraj, Ashwath David, Sidhanth Menon, Divya Vijay: Toward Low-Latency End-to-End Voice Agents for Telecommunications Using Streaming ASR, Quantized LLMs, and Real-Time TTS https://arxiv.org/abs/2508.04721 https://arxiv.org/pdf/2508.04721 https://arxiv.org/html/2508.04721
08.08.2025 06:34 β π 1 π 2 π¬ 0 π 0
Jiatong Li, Simon Doclo: Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement https://arxiv.org/abs/2508.05293 https://arxiv.org/pdf/2508.05293 https://arxiv.org/html/2508.05293
08.08.2025 06:35 β π 0 π 0 π¬ 0 π 0
Tom B\"ackstr\"om, Mohammad Hassan Vali, My Nguyen, Silas Rech: Privacy Disclosure of Similarity in Speech and Language Processing https://arxiv.org/abs/2508.05250 https://arxiv.org/pdf/2508.05250 https://arxiv.org/html/2508.05250
08.08.2025 06:35 β π 0 π 0 π¬ 0 π 0
Seraphina Fong, Marco Matassoni, Alessio Brutti: Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages https://arxiv.org/abs/2508.05149 https://arxiv.org/pdf/2508.05149 https://arxiv.org/html/2508.05149
08.08.2025 06:35 β π 0 π 2 π¬ 0 π 0
Anuprabha M, Krishna Gurugubelli, Anil Kumar Vuppala: Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS https://arxiv.org/abs/2508.05102 https://arxiv.org/pdf/2508.05102 https://arxiv.org/html/2508.05102
08.08.2025 06:35 β π 0 π 1 π¬ 0 π 0
Naoyuki Kamo, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani: MOVER: Combining Multiple Meeting Recognition Systems https://arxiv.org/abs/2508.05055 https://arxiv.org/pdf/2508.05055 https://arxiv.org/html/2508.05055
08.08.2025 06:35 β π 0 π 0 π¬ 0 π 0
Jiang, Ning, Wang, Wang, Bi, Zhu, Xie, Fu: REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers https://arxiv.org/abs/2508.04996 https://arxiv.org/pdf/2508.04996 https://arxiv.org/html/2508.04996
08.08.2025 06:35 β π 0 π 0 π¬ 0 π 0
Henri Gode, Simon Doclo: Closed-Form Successive Relative Transfer Function Vector Estimation based on Blind Oblique Projection Incorporating Noise Whitening https://arxiv.org/abs/2508.04887 https://arxiv.org/pdf/2508.04887 https://arxiv.org/html/2508.04887
08.08.2025 06:35 β π 0 π 0 π¬ 0 π 0
Yael Segal-Feldman, Ann R. Bradlow, Matthew Goldrick, Joseph Keshet: Keyword Spotting with Hyper-Matched Filters for Small Footprint Devices https://arxiv.org/abs/2508.04857 https://arxiv.org/pdf/2508.04857 https://arxiv.org/html/2508.04857
08.08.2025 06:35 β π 0 π 2 π¬ 0 π 0
[2025-08-08 Fri (UTC), 8 new articles found for eessAS Audio and Speech Processing]
08.08.2025 06:35 β π 0 π 0 π¬ 0 π 0
Bart van Merri\"enboer, Vincent Dumoulin, Jenny Hamer, Lauren Harrell, Andrea Burns, Tom Denton: Perch 2.0: The Bittern Lesson for Bioacoustics https://arxiv.org/abs/2508.04665 https://arxiv.org/pdf/2508.04665 https://arxiv.org/html/2508.04665
07.08.2025 06:34 β π 0 π 2 π¬ 0 π 0
Praveen Srinivasa Varadhan, Sherry Thomas, Sai Teja M. S., Suvrat Bhooshan, Mitesh M. Khapra: The State Of TTS: A Case Study with Human Fooling Rates https://arxiv.org/abs/2508.04179 https://arxiv.org/pdf/2508.04179 https://arxiv.org/html/2508.04179
07.08.2025 06:30 β π 0 π 3 π¬ 0 π 0
Bingshen Mu, Yiwen Shao, Kun Wei, Dong Yu, Lei Xie: Efficient Scaling for LLM-based ASR https://arxiv.org/abs/2508.04096 https://arxiv.org/pdf/2508.04096 https://arxiv.org/html/2508.04096
07.08.2025 06:34 β π 0 π 1 π¬ 0 π 0
Dinkel, Li, Liu, Luan, Niu, Sun, Wang, Xiao, Zhang, Zhou: MiDashengLM: Efficient Audio Understanding with General Audio Captions https://arxiv.org/abs/2508.03983 https://arxiv.org/pdf/2508.03983 https://arxiv.org/html/2508.03983
07.08.2025 06:34 β π 0 π 1 π¬ 0 π 0
Katharina Hoedt, Arthur Flexer, Gerhard Widmer: Are Inherently Interpretable Models More Robust? A Study In Music Emotion Recognition https://arxiv.org/abs/2508.03780 https://arxiv.org/pdf/2508.03780 https://arxiv.org/html/2508.03780
07.08.2025 06:34 β π 0 π 2 π¬ 0 π 0
Yifan Hu, Rui Liu, Yi Ren, Xiang Yin, Haizhou Li: UniTalker: Conversational Speech-Visual Synthesis https://arxiv.org/abs/2508.04585 https://arxiv.org/pdf/2508.04585 https://arxiv.org/html/2508.04585
07.08.2025 06:36 β π 0 π 0 π¬ 0 π 0
Franziska Braun, Christopher Witzl, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer: Pitfalls and Limits in Automatic Dementia Assessment https://arxiv.org/abs/2508.04512 https://arxiv.org/pdf/2508.04512 https://arxiv.org/html/2508.04512
07.08.2025 06:36 β π 0 π 0 π¬ 0 π 0
Yash Bhake, Ankit Anand, Preeti Rao: Melodic and Metrical Elements of Expressiveness in Hindustani Vocal Music https://arxiv.org/abs/2508.04430 https://arxiv.org/pdf/2508.04430 https://arxiv.org/html/2508.04430
07.08.2025 06:35 β π 0 π 1 π¬ 0 π 0