arXiv eess.AS Audio and Speech Processing's Avatar

arXiv eess.AS Audio and Speech Processing

@eessas-bot.bsky.social

Unofficial bot by @vele.bsky.social w/ http://github.com/so-okada/bXiv https://arxiv.org/list/eess.AS/new List https://bsky.app/profile/vele.bsky.social/lists/3lim7ccweqo2j ModList https://bsky.app/profile/vele.bsky.social/lists/3lim3qnexsw2g

26 Followers  |  2 Following  |  3,316 Posts  |  Joined: 16.02.2025  |  1.6629

Latest posts by eessas-bot.bsky.social on Bluesky

Pablo M. Delgado, Sascha Dick, Christoph Thompson, Chih-Wei Wu, Phillip A. Williams: Exploring Perceptual Audio Quality Measurement on Stereo Processing Using the Open Dataset of Audio Quality https://arxiv.org/abs/2512.10689 https://arxiv.org/pdf/2512.10689 https://arxiv.org/html/2512.10689

12.12.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

[2025-12-12 Fri (UTC), 1 new article found for eessAS Audio and Speech Processing]

12.12.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Zhaolan Huang, Emmanuel Baccelli: TinyD\'ej\`aVu: Smaller Memory Footprint & Faster Inference on Sensor Data Streams with Always-On Microcontrollers https://arxiv.org/abs/2512.09786 https://arxiv.org/pdf/2512.09786 https://arxiv.org/html/2512.09786

11.12.2025 06:33 β€” πŸ‘ 0    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

Karamvir Singh: Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture https://arxiv.org/abs/2512.08973 https://arxiv.org/pdf/2512.08973 https://arxiv.org/html/2512.08973

11.12.2025 06:34 β€” πŸ‘ 0    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Philipp Grundhuber, Mhd Modar Halimeh, Martin Strau{\ss}, Emanu\"el A. P. Habets: Robust Speech Activity Detection in the Presence of Singing Voice https://arxiv.org/abs/2512.09713 https://arxiv.org/pdf/2512.09713 https://arxiv.org/html/2512.09713

11.12.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Eugenia San Segundo, Aurora L\'opez-Jare\~no, Xin Wang, Junichi Yamagishi: Human perception of audio deepfakes: the role of language and speaking style https://arxiv.org/abs/2512.09221 https://arxiv.org/pdf/2512.09221 https://arxiv.org/html/2512.09221

11.12.2025 06:35 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Jinyoung Park, Won Jang, Jiwoong Park: LG Uplus System with Multi-Speaker IDs and Discriminator-based Sub-Judges for the WildSpoof Challenge https://arxiv.org/abs/2512.09000 https://arxiv.org/pdf/2512.09000 https://arxiv.org/html/2512.09000

11.12.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

[2025-12-11 Thu (UTC), 3 new articles found for eessAS Audio and Speech Processing]

11.12.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Fetrat, Navabi, Dehghanian, Abolghasemi, Rabiee: Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS https://arxiv.org/abs/2512.08006 https://arxiv.org/pdf/2512.08006 https://arxiv.org/html/2512.08006

10.12.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Ishaan Kunwar, Henry Cantor, Tyler Rizzo, Ayaan Qayyum: LocaGen: Sub-Sample Time-Delay Learning for Beam Localization https://arxiv.org/abs/2512.07872 https://arxiv.org/pdf/2512.07872 https://arxiv.org/html/2512.07872

10.12.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Shuaihang Yuan, Congcong Wen, Muhammad Shafique, Anthony Tzes, Yi Fang: AudioScene: Integrating Object-Event Audio into 3D Scenes https://arxiv.org/abs/2512.07845 https://arxiv.org/pdf/2512.07845 https://arxiv.org/html/2512.07845

10.12.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Junyi Peng, Lin Zhang, Jin Li, Oldrich Plchot, Jan Cernocky: BUT Systems for Environmental Sound Deepfake Detection in the ESDD 2026 Challenge https://arxiv.org/abs/2512.08319 https://arxiv.org/pdf/2512.08319 https://arxiv.org/html/2512.08319

10.12.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Gabriele Ravizza, Juli\'an Villegas, Christer P. Volk, Tore Stegenborg-Andersen, Yan Pei: An Adaptive Method for Target Curve Selection https://arxiv.org/abs/2512.08313 https://arxiv.org/pdf/2512.08313 https://arxiv.org/html/2512.08313

10.12.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

[2025-12-10 Wed (UTC), 2 new articles found for eessAS Audio and Speech Processing]

10.12.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Srihari Bandarupalli, Bhavana Akkiraju, Charan Devarakonda, Vamsiraghusimha Narsinga, Anil Kumar Vuppala: Efficient ASR for Low-Resource Languages: Leveraging Cross-Lingual Unlabeled Data https://arxiv.org/abs/2512.07277 https://arxiv.org/pdf/2512.07277 https://arxiv.org/html/2512.07277

09.12.2025 06:30 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Akkiraju, Bandarupalli, Sambangi, Ravuri, Saraswathi, Vuppala: TeluguST-46: A Benchmark Corpus and Comprehensive Evaluation for Telugu-English Speech Translation https://arxiv.org/abs/2512.07265 https://arxiv.org/pdf/2512.07265 https://arxiv.org/html/2512.07265

09.12.2025 06:30 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Ioannides, Constantinou, Chadha, Elkins, Pang, Shwartz-Ziv, LeCun: JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention https://arxiv.org/abs/2512.07168 https://arxiv.org/pdf/2512.07168 https://arxiv.org/html/2512.07168

09.12.2025 06:34 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Jisoo Park, Seonghak Lee, Guisik Kim, Taewoo Kim, Junseok Kwon: Lightweight Wasserstein Audio-Visual Model for Unified Speech Enhancement and Separation https://arxiv.org/abs/2512.06689 https://arxiv.org/pdf/2512.06689 https://arxiv.org/html/2512.06689

09.12.2025 06:31 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Candy Olivia Mawalim, Haotian Zhang, Shogo Okada: Technical Report of Nomi Team in the Environmental Sound Deepfake Detection Challenge 2026 https://arxiv.org/abs/2512.06041 https://arxiv.org/pdf/2512.06041 https://arxiv.org/html/2512.06041

09.12.2025 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Alireza Mohammadi, Keshav Sood, Dhananjay Thiruvady, Asef Nazari: Physics-Guided Deepfake Detection for Voice Authentication Systems https://arxiv.org/abs/2512.06040 https://arxiv.org/pdf/2512.06040 https://arxiv.org/html/2512.06040

09.12.2025 06:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Jens Ahrens: Introduction to Ambisonics, Part 1: The Part With No Math https://arxiv.org/abs/2512.07570 https://arxiv.org/pdf/2512.07570 https://arxiv.org/html/2512.07570

09.12.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Runwu Shi, Chang Li, Jiang Wang, Rui Zhang, Nabeela Khan, Benjamin Yen, Takeshi Ashizawa, Kazuhiro Nakadai: Unsupervised Single-Channel Audio Separation with Diffusion Source Priors https://arxiv.org/abs/2512.07226 https://arxiv.org/pdf/2512.07226 https://arxiv.org/html/2512.07226

09.12.2025 06:35 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Xining Song, Zhihua Wei, Rui Wang, Haixiao Hu, Yanxiang Chen, Meng Han: Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation https://arxiv.org/abs/2512.06304 https://arxiv.org/pdf/2512.06304 https://arxiv.org/html/2512.06304

09.12.2025 06:35 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Rohan Sharma, Dancheng Liu, Jingchen Sun, Shijie Zhou, Jiayu Qin, Jinjun Xiong, Changyou Chen: KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening https://arxiv.org/abs/2512.05994 https://arxiv.org/pdf/2512.05994 https://arxiv.org/html/2512.05994

09.12.2025 06:35 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

[2025-12-09 Tue (UTC), 4 new articles found for eessAS Audio and Speech Processing]

09.12.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Katsuhiko Yamamoto, Koichi Miyazaki, Shogo Seki: The T12 System for AudioMOS Challenge 2025: Audio Aesthetics Score Prediction System Using KAN- and VERSA-based Models https://arxiv.org/abs/2512.05592 https://arxiv.org/pdf/2512.05592 https://arxiv.org/html/2512.05592

08.12.2025 06:34 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Akama, Zhang, Nagashima, Yutaka, Minamikawa, Polouliakh: Decoding Selective Auditory Attention to Musical Elements in Ecologically Valid Music Listening https://arxiv.org/abs/2512.05528 https://arxiv.org/pdf/2512.05528 https://arxiv.org/html/2512.05528

08.12.2025 06:50 β€” πŸ‘ 0    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

Obo, Fujita, Ishii, Moriyama, Tsuchiya, Ohashi, Seki: Noise Suppression for Time Difference of Arrival: Performance Evaluation of a Generalized Cross-Correlation Method Using Mean Signal and ... https://arxiv.org/abs/2512.05355 https://arxiv.org/pdf/2512.05355 https://arxiv.org/html/2512.05355

08.12.2025 06:35 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Xuanru Zhou, Jiachen Lian, Henry Hong, Xinyi Yang, Gopala Anumanchipalli: Speech World Model: Causal State-Action Planning with Explicit Reasoning for Speech https://arxiv.org/abs/2512.05933 https://arxiv.org/pdf/2512.05933 https://arxiv.org/html/2512.05933

08.12.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Guo, Yadav, Foster, Stathopoulos, Chen, Prodromakis, Wang: A Multi-Channel Auditory Signal Encoder with Adaptive Resolution Using Volatile Memristors https://arxiv.org/abs/2512.05701 https://arxiv.org/pdf/2512.05701 https://arxiv.org/html/2512.05701

08.12.2025 06:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@eessas-bot is following 2 prominent accounts