This work is done during my internship at Adobe Research. Big thanks to all my collaborators @pseeth.bsky.social, Bryan Russell, @urinieto.bsky.social, David Bourgin, @andrewowens.bsky.social, and @justinsalamon.bsky.social!
27.11.2024 02:58 โ ๐ 5 ๐ 0 ๐ฌ 0 ๐ 0
We jointly train our model on high-quality text-audio pairs as well as videos, enabling our model to generate full-bandwidth professional audio with fine-grained creative control and synchronization.
27.11.2024 02:58 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0
MultiFoley is a unified framework for video-guided audio generation leveraging text, audio, and video conditioning within a single model. As a result, we can do text-guided foley, audio-guided foley (e.g. sync your favorite sample with the video), and foley audio extension.
27.11.2024 02:58 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
๐ฅ Introducing MultiFoley, a video-aware audio generation method with multimodal controls! ๐
We can
โจ๏ธMake a typewriter sound like a piano ๐น
๐ฑMake a cat meow like a lion roars! ๐ฆ
โฑ๏ธPerfectly time existing SFX ๐ฅ to a video.
arXiv: arxiv.org/abs/2411.17698
website: ificl.github.io/MultiFoley/
27.11.2024 02:58 โ ๐ 41 ๐ 12 ๐ฌ 1 ๐ 6
Professor of Computer Vision, @BristolUni. Senior Research Scientist @GoogleDeepMind - passionate about the temporal stream in our lives.
http://dimadamen.github.io
Postdoc in the Hayden lab at Baylor College of Medicine studying neural computations of natural language & communication in humans. Sister to someone with autism. she/her. melissafranch.com
Music, audio, and deep learning research at Stability AI ~ Building bridges between audio signal processing wisdom and deep learning.
artintech.substack.com
www.jordipons.me
Researcher (OpenAI. Ex: DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian.
Anon feedback: https://admonymous.co/giffmana
๐ Zรผrich, Suisse ๐ http://lucasb.eyer.be
UC Berkeley + Google DeepMind
holynski.org
Blog: https://sander.ai/
๐ฆ: https://x.com/sedielem
Research Scientist at Google DeepMind (WaveNet, Imagen 3, Veo, ...). I tweet about deep learning (research + software), music, generative models (personal account).
Professor for CS at the Tuebingen AI Center and affiliated Professor at MIT-IBM Watson AI lab - Multimodal learning and video understanding - GC for ICCV 2025 - https://hildekuehne.github.io/
AudioML research scientist at https://audioshake.ai, before: post-doc @inria@social.numerique.gouv.fr, Editor at https://bsky.app/profile/joss-openjournals.bsky.social
All in 17.68% of grey, located in Frankfurt (Germany)
Assistant Professor at University of Michigan | PhD from UC San Diego | Human-Centered Generative AI for Content Creation
Does research on machine learning at Sony AI, Barcelona. Works on audio analysis, synthesis, and retrieval. Likes tennis, music, and wine.
https://serrjoa.github.io/
Associate Professor at UMD CS. YouTube: https://youtube.com/@jbhuang0604
Interested in how computers can learn and see.
Researcher in audio and speech generative models (SampleRNN, MelGAN, DAC, โฆ)
Research Scientist @AdobeResearch. Ex @DescriptApp, @Mila_Quebec
https://ritheshkumar.com
Head of Audio and Video AI Research at Adobe Research
Researcher at Adobe Research. Machine learning on audio. Screamer. Oaklander born in Barcelona. Titan. He/they ๐
www.urinieto.com
Researcher in computer audition, machine learning, and HCI. Sr. Research Scientist, @AdobeResearch. Previously @DescriptApp, @Northwestern.
https://pseeth.github.io/
Associate professor @ Cornell Tech
Principal research scientist at Google DeepMind. Synthesized views are my own.
๐SF Bay Area ๐ http://jonbarron.info
This feed is a mostly-incomplete mirror of https://x.com/jon_barron, I recommend you just follow me there.