I was just notified that our E2 TTS paper received the Best Paper Award at IEEE #SLT2024! Many thanks to all the remarkable collaborators who made this happen!
Paper: arxiv.org/abs/2406.18009
Demo: aka.ms/e2tts
05.12.2024 03:38 โ ๐ 5 ๐ 2 ๐ฌ 0 ๐ 0
Ah, no, TS3-Codec was trained with 10-second audio segments, while BigCodec-S was trained with 2.5-second audio segments (Section 4.5). This was a somewhat tricky (and perhaps debatable) part of the configuration, and we did our best to tune the hyperparameters within the constraints of GPU memory.
03.12.2024 06:18 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Thanks! To the extent that we checked, yes. The important point is limiting the attention window.
03.12.2024 06:04 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
TS3-Codec: yet another audio codec from my former teamโsimple, fast, and high-quality.
Simpleโjust a stack of Transformer and linear layers; no convolutions.
Faster and betterโsuperior audio reconstruction quality with fewer MACs compared to strong convolution-based baselines.
03.12.2024 03:53 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Research Scientist Intern, AI Research - Speech & Audio (PhD)
Meta's mission is to build the future of human connection and the technology that makes it possible.
Our GenAI-Speech team at Meta is hiring RS interns for summer 2025 to work on speech, LLMs, dialog generation, and other exciting stuff! Check out the job posting here: www.metacareers.com/jobs/3841154...
22.11.2024 03:41 โ ๐ 10 ๐ 1 ๐ฌ 0 ๐ 0
่ชใใใฌใใฎใๆฐใซใชใใพใ / Google DeepMind Research Engineer
Co-founder, and CER of Preferred Networks (PFN). CEO of PFCC. Interested in deep learning and AI, science, and business.
Shinji Watanabe's Audio and Voice Lab | WAVLab @LTIatCMU @SCSatCMU | Speech Recognition, Speech Enhancement, Spoken Language Understanding, and more.
I'm working at CMU (2021-). I was working at NTT (2001-2011), MERL (2012-2017), and JHU (2017-2020). Speech and Audio Processing is my main research topic.
PhD Student @ltiatcmu.bsky.social
I work in speech processing.
wanchichen.github.io
Studying language in biological brains and artificial ones at the Kempner Institute at Harvard University.
www.tuckute.com
AI scientist & consultant :: prev Amazon Alexa, Toshiba, Cam Uni :: voice & language tech :: powered by coffee :: photographer :: Cambridge UK
https://www.catherinebreslin.co.uk
Principal Research Scientist at IBM Research AI in New York. Speech, Formal/Natural Language Processing. Currently LLM post-training, structured SDG and RL. Opinions my own and non stationary.
ramon.astudillo.com
Full professor of inclusive speech communication at TU Delft, The Netherlands. Former president of the International Speech Communication Association (ISCA). General Chair of @interspeech.bsky.social Rotterdam, 2025. Mother of 3๐
Professor/Admin @ Ohio State. All opinions expressed on this channel are my personal opinions and do not represent that of my employer.
Speech and audio research scientist @MERL. saneworkshop.org co-founder. IguanaTex developer.
๐ jonathanleroux.org
๐ github.com/Jonathan-LeRoux/
๐ scholar.google.com/citations?user=aUpxty8AAAAJ&hl=en
Guitarist, Researcher Google DeepMind. Opinions are my own.
Researcher in computer audition, machine learning, and HCI. Sr. Research Scientist, @AdobeResearch. Previously @DescriptApp, @Northwestern.
https://pseeth.github.io/
AI for Music โข Research Scientist @ Suno
I created pyannote open source toolkit.
Co-founder and CSO at pyannoteAI