Pablo M. Delgado, Sascha Dick, Christoph Thompson, Chih-Wei Wu, Phillip A. Williams: Exploring Perceptual Audio Quality Measurement on Stereo Processing Using the Open Dataset of Audio Quality https://arxiv.org/abs/2512.10689 https://arxiv.org/pdf/2512.10689 https://arxiv.org/html/2512.10689
12.12.2025 06:35 β π 0 π 0 π¬ 0 π 0
[2025-12-12 Fri (UTC), 1 new article found for eessAS Audio and Speech Processing]
12.12.2025 06:35 β π 0 π 0 π¬ 0 π 0
Zhaolan Huang, Emmanuel Baccelli: TinyD\'ej\`aVu: Smaller Memory Footprint & Faster Inference on Sensor Data Streams with Always-On Microcontrollers https://arxiv.org/abs/2512.09786 https://arxiv.org/pdf/2512.09786 https://arxiv.org/html/2512.09786
11.12.2025 06:33 β π 0 π 4 π¬ 0 π 0
Karamvir Singh: Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture https://arxiv.org/abs/2512.08973 https://arxiv.org/pdf/2512.08973 https://arxiv.org/html/2512.08973
11.12.2025 06:34 β π 0 π 3 π¬ 0 π 0
Philipp Grundhuber, Mhd Modar Halimeh, Martin Strau{\ss}, Emanu\"el A. P. Habets: Robust Speech Activity Detection in the Presence of Singing Voice https://arxiv.org/abs/2512.09713 https://arxiv.org/pdf/2512.09713 https://arxiv.org/html/2512.09713
11.12.2025 06:35 β π 0 π 0 π¬ 0 π 0
Eugenia San Segundo, Aurora L\'opez-Jare\~no, Xin Wang, Junichi Yamagishi: Human perception of audio deepfakes: the role of language and speaking style https://arxiv.org/abs/2512.09221 https://arxiv.org/pdf/2512.09221 https://arxiv.org/html/2512.09221
11.12.2025 06:35 β π 0 π 1 π¬ 0 π 0
Jinyoung Park, Won Jang, Jiwoong Park: LG Uplus System with Multi-Speaker IDs and Discriminator-based Sub-Judges for the WildSpoof Challenge https://arxiv.org/abs/2512.09000 https://arxiv.org/pdf/2512.09000 https://arxiv.org/html/2512.09000
11.12.2025 06:35 β π 0 π 0 π¬ 0 π 0
[2025-12-11 Thu (UTC), 3 new articles found for eessAS Audio and Speech Processing]
11.12.2025 06:35 β π 0 π 0 π¬ 0 π 0
Fetrat, Navabi, Dehghanian, Abolghasemi, Rabiee: Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS https://arxiv.org/abs/2512.08006 https://arxiv.org/pdf/2512.08006 https://arxiv.org/html/2512.08006
10.12.2025 06:34 β π 0 π 2 π¬ 0 π 0
Ishaan Kunwar, Henry Cantor, Tyler Rizzo, Ayaan Qayyum: LocaGen: Sub-Sample Time-Delay Learning for Beam Localization https://arxiv.org/abs/2512.07872 https://arxiv.org/pdf/2512.07872 https://arxiv.org/html/2512.07872
10.12.2025 06:34 β π 0 π 2 π¬ 0 π 0
Shuaihang Yuan, Congcong Wen, Muhammad Shafique, Anthony Tzes, Yi Fang: AudioScene: Integrating Object-Event Audio into 3D Scenes https://arxiv.org/abs/2512.07845 https://arxiv.org/pdf/2512.07845 https://arxiv.org/html/2512.07845
10.12.2025 06:34 β π 0 π 2 π¬ 0 π 0
Junyi Peng, Lin Zhang, Jin Li, Oldrich Plchot, Jan Cernocky: BUT Systems for Environmental Sound Deepfake Detection in the ESDD 2026 Challenge https://arxiv.org/abs/2512.08319 https://arxiv.org/pdf/2512.08319 https://arxiv.org/html/2512.08319
10.12.2025 06:35 β π 0 π 0 π¬ 0 π 0
Gabriele Ravizza, Juli\'an Villegas, Christer P. Volk, Tore Stegenborg-Andersen, Yan Pei: An Adaptive Method for Target Curve Selection https://arxiv.org/abs/2512.08313 https://arxiv.org/pdf/2512.08313 https://arxiv.org/html/2512.08313
10.12.2025 06:35 β π 0 π 0 π¬ 0 π 0
[2025-12-10 Wed (UTC), 2 new articles found for eessAS Audio and Speech Processing]
10.12.2025 06:35 β π 0 π 0 π¬ 0 π 0
Srihari Bandarupalli, Bhavana Akkiraju, Charan Devarakonda, Vamsiraghusimha Narsinga, Anil Kumar Vuppala: Efficient ASR for Low-Resource Languages: Leveraging Cross-Lingual Unlabeled Data https://arxiv.org/abs/2512.07277 https://arxiv.org/pdf/2512.07277 https://arxiv.org/html/2512.07277
09.12.2025 06:30 β π 0 π 1 π¬ 0 π 0
Akkiraju, Bandarupalli, Sambangi, Ravuri, Saraswathi, Vuppala: TeluguST-46: A Benchmark Corpus and Comprehensive Evaluation for Telugu-English Speech Translation https://arxiv.org/abs/2512.07265 https://arxiv.org/pdf/2512.07265 https://arxiv.org/html/2512.07265
09.12.2025 06:30 β π 0 π 1 π¬ 0 π 0
Ioannides, Constantinou, Chadha, Elkins, Pang, Shwartz-Ziv, LeCun: JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention https://arxiv.org/abs/2512.07168 https://arxiv.org/pdf/2512.07168 https://arxiv.org/html/2512.07168
09.12.2025 06:34 β π 0 π 2 π¬ 0 π 0
Jisoo Park, Seonghak Lee, Guisik Kim, Taewoo Kim, Junseok Kwon: Lightweight Wasserstein Audio-Visual Model for Unified Speech Enhancement and Separation https://arxiv.org/abs/2512.06689 https://arxiv.org/pdf/2512.06689 https://arxiv.org/html/2512.06689
09.12.2025 06:31 β π 0 π 1 π¬ 0 π 0
Candy Olivia Mawalim, Haotian Zhang, Shogo Okada: Technical Report of Nomi Team in the Environmental Sound Deepfake Detection Challenge 2026 https://arxiv.org/abs/2512.06041 https://arxiv.org/pdf/2512.06041 https://arxiv.org/html/2512.06041
09.12.2025 06:34 β π 0 π 1 π¬ 0 π 0
Alireza Mohammadi, Keshav Sood, Dhananjay Thiruvady, Asef Nazari: Physics-Guided Deepfake Detection for Voice Authentication Systems https://arxiv.org/abs/2512.06040 https://arxiv.org/pdf/2512.06040 https://arxiv.org/html/2512.06040
09.12.2025 06:34 β π 0 π 1 π¬ 0 π 0
Jens Ahrens: Introduction to Ambisonics, Part 1: The Part With No Math https://arxiv.org/abs/2512.07570 https://arxiv.org/pdf/2512.07570 https://arxiv.org/html/2512.07570
09.12.2025 06:35 β π 0 π 0 π¬ 0 π 0
Runwu Shi, Chang Li, Jiang Wang, Rui Zhang, Nabeela Khan, Benjamin Yen, Takeshi Ashizawa, Kazuhiro Nakadai: Unsupervised Single-Channel Audio Separation with Diffusion Source Priors https://arxiv.org/abs/2512.07226 https://arxiv.org/pdf/2512.07226 https://arxiv.org/html/2512.07226
09.12.2025 06:35 β π 0 π 1 π¬ 0 π 0
Xining Song, Zhihua Wei, Rui Wang, Haixiao Hu, Yanxiang Chen, Meng Han: Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation https://arxiv.org/abs/2512.06304 https://arxiv.org/pdf/2512.06304 https://arxiv.org/html/2512.06304
09.12.2025 06:35 β π 0 π 2 π¬ 0 π 0
Rohan Sharma, Dancheng Liu, Jingchen Sun, Shijie Zhou, Jiayu Qin, Jinjun Xiong, Changyou Chen: KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening https://arxiv.org/abs/2512.05994 https://arxiv.org/pdf/2512.05994 https://arxiv.org/html/2512.05994
09.12.2025 06:35 β π 0 π 2 π¬ 0 π 0
[2025-12-09 Tue (UTC), 4 new articles found for eessAS Audio and Speech Processing]
09.12.2025 06:35 β π 0 π 0 π¬ 0 π 0
Katsuhiko Yamamoto, Koichi Miyazaki, Shogo Seki: The T12 System for AudioMOS Challenge 2025: Audio Aesthetics Score Prediction System Using KAN- and VERSA-based Models https://arxiv.org/abs/2512.05592 https://arxiv.org/pdf/2512.05592 https://arxiv.org/html/2512.05592
08.12.2025 06:34 β π 1 π 1 π¬ 0 π 0
Akama, Zhang, Nagashima, Yutaka, Minamikawa, Polouliakh: Decoding Selective Auditory Attention to Musical Elements in Ecologically Valid Music Listening https://arxiv.org/abs/2512.05528 https://arxiv.org/pdf/2512.05528 https://arxiv.org/html/2512.05528
08.12.2025 06:50 β π 0 π 4 π¬ 0 π 0
Obo, Fujita, Ishii, Moriyama, Tsuchiya, Ohashi, Seki: Noise Suppression for Time Difference of Arrival: Performance Evaluation of a Generalized Cross-Correlation Method Using Mean Signal and ... https://arxiv.org/abs/2512.05355 https://arxiv.org/pdf/2512.05355 https://arxiv.org/html/2512.05355
08.12.2025 06:35 β π 0 π 1 π¬ 0 π 0
Xuanru Zhou, Jiachen Lian, Henry Hong, Xinyi Yang, Gopala Anumanchipalli: Speech World Model: Causal State-Action Planning with Explicit Reasoning for Speech https://arxiv.org/abs/2512.05933 https://arxiv.org/pdf/2512.05933 https://arxiv.org/html/2512.05933
08.12.2025 06:35 β π 0 π 0 π¬ 0 π 0
Guo, Yadav, Foster, Stathopoulos, Chen, Prodromakis, Wang: A Multi-Channel Auditory Signal Encoder with Adaptive Resolution Using Volatile Memristors https://arxiv.org/abs/2512.05701 https://arxiv.org/pdf/2512.05701 https://arxiv.org/html/2512.05701
08.12.2025 06:35 β π 0 π 0 π¬ 0 π 0