Junnuo Wang: Audio Palette: A Diffusion Transformer with Multi-Signal Conditioning for Controllable Foley Synthesis https://arxiv.org/abs/2510.12175 https://arxiv.org/pdf/2510.12175 https://arxiv.org/html/2510.12175
15.10.2025 06:34 β π 0 π 1 π¬ 0 π 0
Guojian Li, Qijie Shao, Zhixian Zhao, Shuiyuan Wang, Zhonghua Fu, Lei Xie: Serial-Parallel Dual-Path Architecture for Speaking Style Recognition https://arxiv.org/abs/2510.11732 https://arxiv.org/pdf/2510.11732 https://arxiv.org/html/2510.11732
15.10.2025 06:34 β π 0 π 2 π¬ 0 π 0
Jiatong Li, Simon Doclo: I-DCCRN-VAE: An Improved Deep Representation Learning Framework for Complex VAE-based Single-channel Speech Enhancement https://arxiv.org/abs/2510.12485 https://arxiv.org/pdf/2510.12485 https://arxiv.org/html/2510.12485
15.10.2025 06:35 β π 0 π 0 π¬ 0 π 0
Klaus Linhard, Philipp Bulling: A Phase Synthesizer for Decorrelation to Improve Acoustic Feedback Cancellation https://arxiv.org/abs/2510.12377 https://arxiv.org/pdf/2510.12377 https://arxiv.org/html/2510.12377
15.10.2025 06:35 β π 0 π 0 π¬ 0 π 0
Guanxin Jiang, Andreas Brendel, Pablo M. Delgado, J\"urgen Herre: DeePAQ: A Perceptual Audio Quality Metric Based On Foundational Models and Weakly Supervised Learning https://arxiv.org/abs/2510.12326 https://arxiv.org/pdf/2510.12326 https://arxiv.org/html/2510.12326
15.10.2025 06:35 β π 0 π 0 π¬ 0 π 0
Song, Zhuang, Chen, Niu, Yang, Du, Chen, Wang, Wang, Chen: DiSTAR: Diffusion over a Scalable Token Autoregressive Representation for Speech Generation https://arxiv.org/abs/2510.12210 https://arxiv.org/pdf/2510.12210 https://arxiv.org/html/2510.12210
15.10.2025 06:35 β π 0 π 2 π¬ 0 π 0
Wanying Ge, Xin Wang, Junichi Yamagishi: FakeMark: Deepfake Speech Attribution With Watermarked Artifacts https://arxiv.org/abs/2510.12042 https://arxiv.org/pdf/2510.12042 https://arxiv.org/html/2510.12042
15.10.2025 06:35 β π 0 π 0 π¬ 0 π 0
[2025-10-15 Wed (UTC), 5 new articles found for eessAS Audio and Speech Processing]
15.10.2025 06:35 β π 0 π 0 π¬ 0 π 0
Alain Riou, Joan Serr\`a, Yuki Mitsufuji: Automatic Music Sample Identification with Multi-Track Contrastive Learning https://arxiv.org/abs/2510.11507 https://arxiv.org/pdf/2510.11507 https://arxiv.org/html/2510.11507
14.10.2025 06:35 β π 0 π 3 π¬ 0 π 0
KiHyun Nam, Jongmin Choi, Hyeongkeun Lee, Jungwoo Heo, Joon Son Chung: Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap https://arxiv.org/abs/2510.11330 https://arxiv.org/pdf/2510.11330 https://arxiv.org/html/2510.11330
14.10.2025 06:34 β π 0 π 4 π¬ 0 π 0
Xinyu Luo, Jie Liu, Kecheng Chen, Junyi Yang, Bo Ding, Arindam Basu, Haoliang Li: Efficient Edge Test-Time Adaptation via Latent Feature Coordinate Correction https://arxiv.org/abs/2510.11068 https://arxiv.org/pdf/2510.11068 https://arxiv.org/html/2510.11068
14.10.2025 06:35 β π 0 π 2 π¬ 0 π 0
Xuyao Deng, Yanjie Sun, Yong Dou, Kele Xu: Unify Variables in Neural Scaling Laws for General Audio Representations via Embedding Effective Rank https://arxiv.org/abs/2510.10948 https://arxiv.org/pdf/2510.10948 https://arxiv.org/html/2510.10948
14.10.2025 06:34 β π 0 π 2 π¬ 0 π 0
Yerin Hong, Juhwan Lim, Jinhong Min, Nishkarsh Agarwal, Robert Hovden, Ageeth A. Bol, Yiyang Li: Delayed 1T to 2H Phase Transition Upon Electrochemical Delithiation of LiMoS2 https://arxiv.org/abs/2510.10911 https://arxiv.org/pdf/2510.10911 https://arxiv.org/html/2510.10911
14.10.2025 06:43 β π 0 π 1 π¬ 0 π 0
Lokhande, Dewangan, Mansoori, Chaudhari, J., Lokhande, Teman, Vishvakarma: Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation https://arxiv.org/abs/2510.10676 https://arxiv.org/pdf/2510.10676 https://arxiv.org/html/2510.10676
14.10.2025 06:29 β π 0 π 3 π¬ 0 π 0
Stephen Ni-Hahn, Chao P\'eter Yang, Mingchen Ma, Cynthia Rudin, Simon Mak, Yue Jiang: ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis https://arxiv.org/abs/2510.10249 https://arxiv.org/pdf/2510.10249 https://arxiv.org/html/2510.10249
14.10.2025 06:34 β π 0 π 2 π¬ 0 π 0
Xian He, Wei Zeng, Ye Wang: Peransformer: Improving Low-informed Expressive Performance Rendering with Score-aware Discriminator https://arxiv.org/abs/2510.10175 https://arxiv.org/pdf/2510.10175 https://arxiv.org/html/2510.10175
14.10.2025 06:34 β π 0 π 1 π¬ 0 π 0
Paul Haimes: Chord Colourizer: A Near Real-Time System for Visualizing Musical Key https://arxiv.org/abs/2510.10173 https://arxiv.org/pdf/2510.10173 https://arxiv.org/html/2510.10173
14.10.2025 06:32 β π 0 π 3 π¬ 0 π 0
Wang, Zhao, Liu, Ge, Xu, Xiao, Gao, Yu, Zhu: MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-token Prediction https://arxiv.org/abs/2510.10003 https://arxiv.org/pdf/2510.10003 https://arxiv.org/html/2510.10003
14.10.2025 06:30 β π 0 π 2 π¬ 0 π 0
Soubhagya Ranjan Hota, Arka Roy, Udit Satija: ILD-VIT: A Unified Vision Transformer Architecture for Detection of Interstitial Lung Disease from Respiratory Sounds https://arxiv.org/abs/2510.11458 https://arxiv.org/pdf/2510.11458 https://arxiv.org/html/2510.11458
14.10.2025 06:35 β π 0 π 1 π¬ 0 π 0
Haixin Zhao, Kaixuan Yang, Nilesh Madhu: Dynamically Slimmable Speech Enhancement Network with Metric-Guided Training https://arxiv.org/abs/2510.11395 https://arxiv.org/pdf/2510.11395 https://arxiv.org/html/2510.11395
14.10.2025 06:35 β π 0 π 0 π¬ 0 π 0
Ruben Johnson Robert Jeremiah, Peyman Goli, Steven van de Par: Phase Aware Ear-Conditioned Learning for Multi-Channel Binaural Speaker Separation https://arxiv.org/abs/2510.11366 https://arxiv.org/pdf/2510.11366 https://arxiv.org/html/2510.11366
14.10.2025 06:35 β π 0 π 0 π¬ 0 π 0
Ali Fallah, Shun Nakamura, Steven van de Par: Perceptual Compensation of Ambisonics Recordings for Reproduction in Room https://arxiv.org/abs/2510.10883 https://arxiv.org/pdf/2510.10883 https://arxiv.org/html/2510.10883
14.10.2025 06:35 β π 0 π 1 π¬ 0 π 0
[2025-10-14 Tue (UTC), 4 new articles found for eessAS Audio and Speech Processing]
14.10.2025 06:35 β π 0 π 0 π¬ 0 π 0
Mohammad Hossein Sameti, Sepehr Harfi Moridani, Ali Zarean, Hossein Sameti: Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking https://arxiv.org/abs/2510.09528 https://arxiv.org/pdf/2510.09528 https://arxiv.org/html/2510.09528
13.10.2025 06:31 β π 0 π 2 π¬ 0 π 0
Nizar El Ghazal, Antoine Caubri\`ere, Valentin Vielzeuf: The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialogue State Tracking Approach https://arxiv.org/abs/2510.09424 https://arxiv.org/pdf/2510.09424 https://arxiv.org/html/2510.09424
13.10.2025 06:31 β π 0 π 3 π¬ 0 π 0
Atul Shree, Harshith Jupuru: FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms https://arxiv.org/abs/2510.09085 https://arxiv.org/pdf/2510.09085 https://arxiv.org/html/2510.09085
13.10.2025 06:33 β π 0 π 2 π¬ 0 π 0
Akira Takahashi, Shusuke Takahashi, Yuki Mitsufuji: MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation https://arxiv.org/abs/2510.09065 https://arxiv.org/pdf/2510.09065 https://arxiv.org/html/2510.09065
13.10.2025 06:34 β π 0 π 3 π¬ 0 π 0
Huu Tuong Tu, Huan Vu, cuong tien nguyen, Dien Hy Ngo, Nguyen Thi Thu Trang: O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion https://arxiv.org/abs/2510.09061 https://arxiv.org/pdf/2510.09061 https://arxiv.org/html/2510.09061
13.10.2025 06:34 β π 0 π 1 π¬ 0 π 0
Louis Bahrman (IDS, S2A), Mathieu Fontaine (IDS, S2A), Ga\"el Richard (IDS, S2A): D\'er\'everb\'eration non-supervis\'ee de la parole par mod\`ele hybride https://arxiv.org/abs/2510.09025 https://arxiv.org/pdf/2510.09025 https://arxiv.org/html/2510.09025
13.10.2025 06:34 β π 0 π 2 π¬ 0 π 0
Du, Deng, Guo, Gao, Li, Cheng, Han, Yang, Liu, Zhong, Fu: DiTSinger: Scaling Singing Voice Synthesis with Diffusion Transformer and Implicit Alignment https://arxiv.org/abs/2510.09016 https://arxiv.org/pdf/2510.09016 https://arxiv.org/html/2510.09016
13.10.2025 06:34 β π 0 π 2 π¬ 0 π 0