's Avatar

@ritheshkumar.bsky.social

Researcher in audio and speech generative models (SampleRNN, MelGAN, DAC, …) Research Scientist @AdobeResearch. Ex @DescriptApp, @Mila_Quebec https://ritheshkumar.com

90 Followers  |  183 Following  |  1 Posts  |  Joined: 20.11.2024  |  1.4057

Latest posts by ritheshkumar.bsky.social on Bluesky

Thanks for this! Would love to be added

26.12.2024 18:10 — 👍 1    🔁 0    💬 0    📌 0
Post image Post image Post image

The code for Simplified and Generalized Masked Diffusion for Discrete Data (Jiaxin Shi et al) has been released and a lecture by @arnauddoucet.bsky.social on this topic is also available!

🐍 Code: github.com/google-deepm...
📄 Article: arxiv.org/abs/2406.04329
📼 Video: www.youtube.com/watch?v=qj9B...

14.12.2024 12:47 — 👍 26    🔁 6    💬 0    📌 0
Video thumbnail

new paper! 🗣️Sketch2Sound💥

Sketch2Sound can create sounds from sonic imitations (i.e., a vocal imitation or a reference sound) via interpretable, time-varying control signals.

paper: arxiv.org/abs/2412.08550
web: hugofloresgarcia.art/sketch2sound

12.12.2024 14:43 — 👍 23    🔁 9    💬 2    📌 5
Video thumbnail

🎥 Introducing MultiFoley, a video-aware audio generation method with multimodal controls! 🔊
We can
⌨️Make a typewriter sound like a piano 🎹
🐱Make a cat meow like a lion roars! 🦁
⏱️Perfectly time existing SFX 💥 to a video.

arXiv: arxiv.org/abs/2411.17698
website: ificl.github.io/MultiFoley/

27.11.2024 02:58 — 👍 41    🔁 12    💬 1    📌 6

I initiated a starter pack for Audio ML. Let me know if you'd like to be added/removed.
go.bsky.app/LGmct4z

18.11.2024 04:46 — 👍 67    🔁 22    💬 46    📌 1

Made a feed that tries to index paper threads only: bsky.app/profile/psee.... To get into the feed, make a post with "arxiv.org" in the post somewhere + don't be a bot. My tiny contribution to the recent migration! Built w/ @skyfeed.app. Planning on some paper threads of my own soon...

24.11.2024 04:01 — 👍 7    🔁 2    💬 0    📌 1

@ritheshkumar is following 20 prominent accounts