's Avatar

@naoyukikandaslp.bsky.social

147 Followers  |  48 Following  |  4 Posts  |  Joined: 20.11.2024  |  1.8269

Latest posts by naoyukikandaslp.bsky.social on Bluesky

Post image

I was just notified that our E2 TTS paper received the Best Paper Award at IEEE #SLT2024! Many thanks to all the remarkable collaborators who made this happen!

Paper: arxiv.org/abs/2406.18009
Demo: aka.ms/e2tts

05.12.2024 03:38 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Ah, no, TS3-Codec was trained with 10-second audio segments, while BigCodec-S was trained with 2.5-second audio segments (Section 4.5). This was a somewhat tricky (and perhaps debatable) part of the configuration, and we did our best to tune the hyperparameters within the constraints of GPU memory.

03.12.2024 06:18 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thanks! To the extent that we checked, yes. The important point is limiting the attention window.

03.12.2024 06:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

TS3-Codec: yet another audio codec from my former teamโ€”simple, fast, and high-quality.

Simpleโ€”just a stack of Transformer and linear layers; no convolutions.

Faster and betterโ€”superior audio reconstruction quality with fewer MACs compared to strong convolution-based baselines.

03.12.2024 03:53 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Research Scientist Intern, AI Research - Speech & Audio (PhD) Meta's mission is to build the future of human connection and the technology that makes it possible.

Our GenAI-Speech team at Meta is hiring RS interns for summer 2025 to work on speech, LLMs, dialog generation, and other exciting stuff! Check out the job posting here: www.metacareers.com/jobs/3841154...

22.11.2024 03:41 โ€” ๐Ÿ‘ 10    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@naoyukikandaslp is following 20 prominent accounts