Ziyang Chen's Avatar

Ziyang Chen

@czyang.bsky.social

Ph.D. Student @ UMich EECS. Multimodal learning, audio-visual learning and computer vision. Prev research Intern @Adobe and @Meta https://ificl.github.io/

60 Followers  |  22 Following  |  4 Posts  |  Joined: 26.11.2024  |  1.7603

Latest posts by czyang.bsky.social on Bluesky

This work is done during my internship at Adobe Research. Big thanks to all my collaborators @pseeth.bsky.social, Bryan Russell, @urinieto.bsky.social, David Bourgin, @andrewowens.bsky.social, and @justinsalamon.bsky.social!

27.11.2024 02:58 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

We jointly train our model on high-quality text-audio pairs as well as videos, enabling our model to generate full-bandwidth professional audio with fine-grained creative control and synchronization.

27.11.2024 02:58 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

MultiFoley is a unified framework for video-guided audio generation leveraging text, audio, and video conditioning within a single model. As a result, we can do text-guided foley, audio-guided foley (e.g. sync your favorite sample with the video), and foley audio extension.

27.11.2024 02:58 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

๐ŸŽฅ Introducing MultiFoley, a video-aware audio generation method with multimodal controls! ๐Ÿ”Š
We can
โŒจ๏ธMake a typewriter sound like a piano ๐ŸŽน
๐ŸฑMake a cat meow like a lion roars! ๐Ÿฆ
โฑ๏ธPerfectly time existing SFX ๐Ÿ’ฅ to a video.

arXiv: arxiv.org/abs/2411.17698
website: ificl.github.io/MultiFoley/

27.11.2024 02:58 โ€” ๐Ÿ‘ 41    ๐Ÿ” 12    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 6

@czyang is following 20 prominent accounts