arXiv eess.AS Audio and Speech Processing's Avatar

arXiv eess.AS Audio and Speech Processing

@eessas-bot.bsky.social

Unofficial bot by @vele.bsky.social w/ http://github.com/so-okada/bXiv https://arxiv.org/list/eess.AS/new List https://bsky.app/profile/vele.bsky.social/lists/3lim7ccweqo2j ModList https://bsky.app/profile/vele.bsky.social/lists/3lim3qnexsw2g

25 Followers  |  2 Following  |  3,116 Posts  |  Joined: 16.02.2025  |  1.6046

Latest posts by eessas-bot.bsky.social on Bluesky

Junnuo Wang: Audio Palette: A Diffusion Transformer with Multi-Signal Conditioning for Controllable Foley Synthesis https://arxiv.org/abs/2510.12175 https://arxiv.org/pdf/2510.12175 https://arxiv.org/html/2510.12175

15.10.2025 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Guojian Li, Qijie Shao, Zhixian Zhao, Shuiyuan Wang, Zhonghua Fu, Lei Xie: Serial-Parallel Dual-Path Architecture for Speaking Style Recognition https://arxiv.org/abs/2510.11732 https://arxiv.org/pdf/2510.11732 https://arxiv.org/html/2510.11732

15.10.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Jiatong Li, Simon Doclo: I-DCCRN-VAE: An Improved Deep Representation Learning Framework for Complex VAE-based Single-channel Speech Enhancement https://arxiv.org/abs/2510.12485 https://arxiv.org/pdf/2510.12485 https://arxiv.org/html/2510.12485

15.10.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Klaus Linhard, Philipp Bulling: A Phase Synthesizer for Decorrelation to Improve Acoustic Feedback Cancellation https://arxiv.org/abs/2510.12377 https://arxiv.org/pdf/2510.12377 https://arxiv.org/html/2510.12377

15.10.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Guanxin Jiang, Andreas Brendel, Pablo M. Delgado, J\"urgen Herre: DeePAQ: A Perceptual Audio Quality Metric Based On Foundational Models and Weakly Supervised Learning https://arxiv.org/abs/2510.12326 https://arxiv.org/pdf/2510.12326 https://arxiv.org/html/2510.12326

15.10.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Song, Zhuang, Chen, Niu, Yang, Du, Chen, Wang, Wang, Chen: DiSTAR: Diffusion over a Scalable Token Autoregressive Representation for Speech Generation https://arxiv.org/abs/2510.12210 https://arxiv.org/pdf/2510.12210 https://arxiv.org/html/2510.12210

15.10.2025 06:35 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Wanying Ge, Xin Wang, Junichi Yamagishi: FakeMark: Deepfake Speech Attribution With Watermarked Artifacts https://arxiv.org/abs/2510.12042 https://arxiv.org/pdf/2510.12042 https://arxiv.org/html/2510.12042

15.10.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

[2025-10-15 Wed (UTC), 5 new articles found for eessAS Audio and Speech Processing]

15.10.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Alain Riou, Joan Serr\`a, Yuki Mitsufuji: Automatic Music Sample Identification with Multi-Track Contrastive Learning https://arxiv.org/abs/2510.11507 https://arxiv.org/pdf/2510.11507 https://arxiv.org/html/2510.11507

14.10.2025 06:35 β€” πŸ‘ 0    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

KiHyun Nam, Jongmin Choi, Hyeongkeun Lee, Jungwoo Heo, Joon Son Chung: Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap https://arxiv.org/abs/2510.11330 https://arxiv.org/pdf/2510.11330 https://arxiv.org/html/2510.11330

14.10.2025 06:34 β€” πŸ‘ 0    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

Xinyu Luo, Jie Liu, Kecheng Chen, Junyi Yang, Bo Ding, Arindam Basu, Haoliang Li: Efficient Edge Test-Time Adaptation via Latent Feature Coordinate Correction https://arxiv.org/abs/2510.11068 https://arxiv.org/pdf/2510.11068 https://arxiv.org/html/2510.11068

14.10.2025 06:35 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Xuyao Deng, Yanjie Sun, Yong Dou, Kele Xu: Unify Variables in Neural Scaling Laws for General Audio Representations via Embedding Effective Rank https://arxiv.org/abs/2510.10948 https://arxiv.org/pdf/2510.10948 https://arxiv.org/html/2510.10948

14.10.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Yerin Hong, Juhwan Lim, Jinhong Min, Nishkarsh Agarwal, Robert Hovden, Ageeth A. Bol, Yiyang Li: Delayed 1T to 2H Phase Transition Upon Electrochemical Delithiation of LiMoS2 https://arxiv.org/abs/2510.10911 https://arxiv.org/pdf/2510.10911 https://arxiv.org/html/2510.10911

14.10.2025 06:43 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Lokhande, Dewangan, Mansoori, Chaudhari, J., Lokhande, Teman, Vishvakarma: Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation https://arxiv.org/abs/2510.10676 https://arxiv.org/pdf/2510.10676 https://arxiv.org/html/2510.10676

14.10.2025 06:29 β€” πŸ‘ 0    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Stephen Ni-Hahn, Chao P\'eter Yang, Mingchen Ma, Cynthia Rudin, Simon Mak, Yue Jiang: ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis https://arxiv.org/abs/2510.10249 https://arxiv.org/pdf/2510.10249 https://arxiv.org/html/2510.10249

14.10.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Xian He, Wei Zeng, Ye Wang: Peransformer: Improving Low-informed Expressive Performance Rendering with Score-aware Discriminator https://arxiv.org/abs/2510.10175 https://arxiv.org/pdf/2510.10175 https://arxiv.org/html/2510.10175

14.10.2025 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Paul Haimes: Chord Colourizer: A Near Real-Time System for Visualizing Musical Key https://arxiv.org/abs/2510.10173 https://arxiv.org/pdf/2510.10173 https://arxiv.org/html/2510.10173

14.10.2025 06:32 β€” πŸ‘ 0    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Wang, Zhao, Liu, Ge, Xu, Xiao, Gao, Yu, Zhu: MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-token Prediction https://arxiv.org/abs/2510.10003 https://arxiv.org/pdf/2510.10003 https://arxiv.org/html/2510.10003

14.10.2025 06:30 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Soubhagya Ranjan Hota, Arka Roy, Udit Satija: ILD-VIT: A Unified Vision Transformer Architecture for Detection of Interstitial Lung Disease from Respiratory Sounds https://arxiv.org/abs/2510.11458 https://arxiv.org/pdf/2510.11458 https://arxiv.org/html/2510.11458

14.10.2025 06:35 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Haixin Zhao, Kaixuan Yang, Nilesh Madhu: Dynamically Slimmable Speech Enhancement Network with Metric-Guided Training https://arxiv.org/abs/2510.11395 https://arxiv.org/pdf/2510.11395 https://arxiv.org/html/2510.11395

14.10.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Ruben Johnson Robert Jeremiah, Peyman Goli, Steven van de Par: Phase Aware Ear-Conditioned Learning for Multi-Channel Binaural Speaker Separation https://arxiv.org/abs/2510.11366 https://arxiv.org/pdf/2510.11366 https://arxiv.org/html/2510.11366

14.10.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Ali Fallah, Shun Nakamura, Steven van de Par: Perceptual Compensation of Ambisonics Recordings for Reproduction in Room https://arxiv.org/abs/2510.10883 https://arxiv.org/pdf/2510.10883 https://arxiv.org/html/2510.10883

14.10.2025 06:35 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

[2025-10-14 Tue (UTC), 4 new articles found for eessAS Audio and Speech Processing]

14.10.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Mohammad Hossein Sameti, Sepehr Harfi Moridani, Ali Zarean, Hossein Sameti: Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking https://arxiv.org/abs/2510.09528 https://arxiv.org/pdf/2510.09528 https://arxiv.org/html/2510.09528

13.10.2025 06:31 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Nizar El Ghazal, Antoine Caubri\`ere, Valentin Vielzeuf: The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialogue State Tracking Approach https://arxiv.org/abs/2510.09424 https://arxiv.org/pdf/2510.09424 https://arxiv.org/html/2510.09424

13.10.2025 06:31 β€” πŸ‘ 0    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Atul Shree, Harshith Jupuru: FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms https://arxiv.org/abs/2510.09085 https://arxiv.org/pdf/2510.09085 https://arxiv.org/html/2510.09085

13.10.2025 06:33 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Akira Takahashi, Shusuke Takahashi, Yuki Mitsufuji: MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation https://arxiv.org/abs/2510.09065 https://arxiv.org/pdf/2510.09065 https://arxiv.org/html/2510.09065

13.10.2025 06:34 β€” πŸ‘ 0    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Huu Tuong Tu, Huan Vu, cuong tien nguyen, Dien Hy Ngo, Nguyen Thi Thu Trang: O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion https://arxiv.org/abs/2510.09061 https://arxiv.org/pdf/2510.09061 https://arxiv.org/html/2510.09061

13.10.2025 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Louis Bahrman (IDS, S2A), Mathieu Fontaine (IDS, S2A), Ga\"el Richard (IDS, S2A): D\'er\'everb\'eration non-supervis\'ee de la parole par mod\`ele hybride https://arxiv.org/abs/2510.09025 https://arxiv.org/pdf/2510.09025 https://arxiv.org/html/2510.09025

13.10.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Du, Deng, Guo, Gao, Li, Cheng, Han, Yang, Liu, Zhong, Fu: DiTSinger: Scaling Singing Voice Synthesis with Diffusion Transformer and Implicit Alignment https://arxiv.org/abs/2510.09016 https://arxiv.org/pdf/2510.09016 https://arxiv.org/html/2510.09016

13.10.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

@eessas-bot is following 2 prominent accounts