arXiv cs.SD Sound's Avatar

arXiv cs.SD Sound

@cssd-bot.bsky.social

Unofficial bot by @vele.bsky.social w/ http://github.com/so-okada/bXiv https://arxiv.org/list/cs.SD/new List https://bsky.app/profile/vele.bsky.social/lists/3lim7ccweqo2j ModList https://bsky.app/profile/vele.bsky.social/lists/3lim3qnexsw2g

32 Followers  |  1 Following  |  5,185 Posts  |  Joined: 16.02.2025  |  1.7701

Latest posts by cssd-bot.bsky.social on Bluesky


Yiming Yang, Guangyong Wang, Haixin Guan, Yanhua Long: Enroll-on-Wakeup: A First Comparative Study of Target Speech Extraction for Seamless Interaction in Real Noisy Human-Machine Dialogue Sc... https://arxiv.org/abs/2602.15519 https://arxiv.org/pdf/2602.15519 https://arxiv.org/html/2602.15519

18.02.2026 06:35 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Takao Kawamura, Daisuke Niizumi, Nobutaka Ono: What Do Neurons Listen To? A Neuron-level Dissection of a General-purpose Audio Model https://arxiv.org/abs/2602.15307 https://arxiv.org/pdf/2602.15307 https://arxiv.org/html/2602.15307

18.02.2026 06:35 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Sonal Kumar, Prem Seetharaman, Ke Chen, Oriol Nieto, Jiaqi Su, Zhepei Wang, Rithesh Kumar, Dinesh Manocha, Nicholas J. Bryan, Zeyu Jin, Justin Salamon: TAC: Timestamped Audio Captioning https://arxiv.org/abs/2602.15766 https://arxiv.org/pdf/2602.15766 https://arxiv.org/html/2602.15766

18.02.2026 06:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Jonah Casebeer, Ge Zhu, Zhepei Wang, Nicholas J. Bryan: A Generative-First Neural Audio Autoencoder https://arxiv.org/abs/2602.15749 https://arxiv.org/pdf/2602.15749 https://arxiv.org/html/2602.15749

18.02.2026 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Qiangong Zhou, Nagasaka Tomohiro: UniTAF: A Modular Framework for Joint Text-to-Speech and Audio-to-Face Modeling https://arxiv.org/abs/2602.15651 https://arxiv.org/pdf/2602.15651 https://arxiv.org/html/2602.15651

18.02.2026 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Samir Sadok, Laurent Girin, Xavier Alameda-Pineda: The Equalizer: Introducing Shape-Gain Decomposition in Neural Audio Codecs https://arxiv.org/abs/2602.15491 https://arxiv.org/pdf/2602.15491 https://arxiv.org/html/2602.15491

18.02.2026 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Zineb Lahrichi, Ga\"etan Hadjeres, Ga\"el Richard, Geoffroy Peeters: S-PRESSO: Ultra Low Bitrate Sound Effect Compression With Diffusion Autoencoders And Offline Quantization https://arxiv.org/abs/2602.15082 https://arxiv.org/pdf/2602.15082 https://arxiv.org/html/2602.15082

18.02.2026 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Wanyu Zang, Yang Yu, Meng Yu: Structure-Aware Piano Accompaniment via Style Planning and Dataset-Aligned Pattern Retrieval https://arxiv.org/abs/2602.15074 https://arxiv.org/pdf/2602.15074 https://arxiv.org/html/2602.15074

18.02.2026 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

[2026-02-18 Wed (UTC), 6 new articles found for csSD Sound]

18.02.2026 06:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yacouba Kaloga, Marina Laganaro, Ina Kodrasi: CLAP-Based Automatic Word Naming Recognition in Post-Stroke Aphasia https://arxiv.org/abs/2602.14584 https://arxiv.org/pdf/2602.14584 https://arxiv.org/html/2602.14584

17.02.2026 06:35 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Sandy H. S. Herho, Rusmawan Suwarman, Nurjanna J. Trilaksono, Iwan P. Anwar, Faiz R. Fajary: Preliminary sonification of ENSO using traditional Javanese gamelan scales https://arxiv.org/abs/2602.14560 https://arxiv.org/pdf/2602.14560 https://arxiv.org/html/2602.14560

17.02.2026 06:49 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Jandad Jahani, Mursal Dawodi, Jawid Ahmad Baktash: From Scarcity to Scale: A Release-Level Analysis of the Pashto Common Voice Dataset https://arxiv.org/abs/2602.14062 https://arxiv.org/pdf/2602.14062 https://arxiv.org/html/2602.14062

17.02.2026 06:30 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Ligong Lei, Wenwen Lu, Xudong Pang, Zaokere Kadeer, Aishan Wumaier: Multimodal Consistency-Guided Reference-Free Data Selection for ASR Accent Adaptation https://arxiv.org/abs/2602.13263 https://arxiv.org/pdf/2602.13263 https://arxiv.org/html/2602.13263

17.02.2026 06:29 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Parth Khadse, Sunil Kumar Kopparapu: Probing Human Articulatory Constraints in End-to-End TTS with Reverse and Mismatched Speech-Text Directions https://arxiv.org/abs/2602.14664 https://arxiv.org/pdf/2602.14664 https://arxiv.org/html/2602.14664

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

H. M. Shadman Tabib, et al.: Bengali-Loop: Community Benchmarks for Long-Form Bangla ASR and Speaker Diarization https://arxiv.org/abs/2602.14291 https://arxiv.org/pdf/2602.14291 https://arxiv.org/html/2602.14291

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Ma, Xu, Ma, Yang, Li, Kim, Xu, Li, Busso, Yu, Chng, Chen: The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents https://arxiv.org/abs/2602.14224 https://arxiv.org/pdf/2602.14224 https://arxiv.org/html/2602.14224

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Keinichi Fujita, Yusuke Ijima: Investigation for Relative Voice Impression Estimation https://arxiv.org/abs/2602.14172 https://arxiv.org/pdf/2602.14172 https://arxiv.org/html/2602.14172

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Reda Bensaid, Amine Ouasfi, Yassir Bendou, Ilyass Moummad, Vincent Gripon, Fran\c{c}ois Leduc-Primeau, Adnane Boukhayma: MUKA: Multi Kernel Audio Adaptation Of Audio-Language Models https://arxiv.org/abs/2602.14127 https://arxiv.org/pdf/2602.14127 https://arxiv.org/html/2602.14127

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Zhang, Lei, Hu, He, Deng, Luo, Zhu, Feng, Liu, He, Sun, Wu, Wang: Eureka-Audio: Triggering Audio Intelligence in Compact Language Models https://arxiv.org/abs/2602.13954 https://arxiv.org/pdf/2602.13954 https://arxiv.org/html/2602.13954

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Aju Ani Justus, Ruchit Agrawal, Sudarsana Reddy Kadiri, Shrikanth Narayanan: voice2mode: Phonation Mode Classification in Singing using Self-Supervised Speech Models https://arxiv.org/abs/2602.13928 https://arxiv.org/pdf/2602.13928 https://arxiv.org/html/2602.13928

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Shen, Jayashankar, Hanna, Kanda, Wang, \v{Z}mol\'ikov\'a, Xie, Moritz, Xu, Gaur, Wornell, He, Wu: GSRM: Generative Speech Reward Model for Speech RLHF https://arxiv.org/abs/2602.13891 https://arxiv.org/pdf/2602.13891 https://arxiv.org/html/2602.13891

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Sripathi Sridhar, Prem Seetharaman, Oriol Nieto, Mark Cartwright, Justin Salamon: Audiocards: Structured Metadata Improves Audio Language Models For Sound Design https://arxiv.org/abs/2602.13835 https://arxiv.org/pdf/2602.13835 https://arxiv.org/html/2602.13835

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Minhui Lu, Joshua D. Reiss: Learning Vocal-Tract Area and Radiation with a Physics-Informed Webster Model https://arxiv.org/abs/2602.13834 https://arxiv.org/pdf/2602.13834 https://arxiv.org/html/2602.13834

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Picinali, Baumgartner, Gaveau, Greco, Liebe, Oomen, Braun: Enhancing spatial hearing with cochlear implants: exploring the role of AI, multimodal interaction and perceptual training https://arxiv.org/abs/2602.13787 https://arxiv.org/pdf/2602.13787 https://arxiv.org/html/2602.13787

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Siqian Tong, Xuan Li, Yiwei Wang, Baolong Bi, Yujun Cai, Shenghua Liu, Yuchen He, Chengpeng Hao: AuTAgent: A Reinforcement Learning Framework for Tool-Augmented Audio Reasoning https://arxiv.org/abs/2602.13685 https://arxiv.org/pdf/2602.13685 https://arxiv.org/html/2602.13685

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Zhe Ye, Xiangui Kang, Jiayi He, Chengxin Chen, Wei Zhu, Kai Wu, Yin Yang, Jiwu Huang: BreathNet: Generalizable Audio Deepfake Detection via Breath-Cue-Guided Feature Refinement https://arxiv.org/abs/2602.13596 https://arxiv.org/pdf/2602.13596 https://arxiv.org/html/2602.13596

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Xu Zhang, Longbing Cao, Runze Yang, Zhangkai Wu: Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition https://arxiv.org/abs/2602.13259 https://arxiv.org/pdf/2602.13259 https://arxiv.org/html/2602.13259

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

[2026-02-17 Tue (UTC), 14 new articles found for csSD Sound]

17.02.2026 06:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Giovanni Bologni, Nicol\'as Arrieta Larraza, Richard Heusdens, Richard C. Hendriks: A two-step approach for speech enhancement in low-SNR scenarios using cyclostationary beamforming and DNNs https://arxiv.org/abs/2602.12986 https://arxiv.org/pdf/2602.12986 https://arxiv.org/html/2602.12986

16.02.2026 06:35 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Louise Zhuang, Samuel Beuret, Ben Frey, Saachi Munot, Jeremy J. Dahl: A Wavefield Correlation Approach to Improve Sound Speed Estimation in Ultrasound Autofocusing https://arxiv.org/abs/2602.12805 https://arxiv.org/pdf/2602.12805 https://arxiv.org/html/2602.12805

16.02.2026 06:48 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

@cssd-bot is following 1 prominent accounts