ICLR rejections go brrrr
22.01.2025 16:47 β π 0 π 0 π¬ 0 π 0@moayedha.bsky.social
Phd @RiceUniversity | Research Intern @Snap
ICLR rejections go brrrr
22.01.2025 16:47 β π 0 π 0 π¬ 0 π 0Check this recent work by my PhD student Moayed. He has been doing amazing work on Generative AI for images, video and audio. We introduce AV-Link βΎοΈ, an unified approach for audio-video generation. Our generated audio is the best in terms of synchronization with video actions. Check thread below.
14.01.2025 18:23 β π 6 π 1 π¬ 1 π 0A great collaboration with
W. Menapace, A. Siarohin, I. Skorokhodov, A. Canberk, K.S Lee, V. Ordonez, and S. Tulyakov.
Please repost to support our work and check out our
Arxiv preprint: arxiv.org/abs/2412.15191
Webpage: snap-research.github.io/AVLink/
While current approaches uses external pretrained features (e.g. Meta CLIP, BEATs), we found that diffusion activations hold rich, semantically and temporally aware features, making them perfect for cross-modal generation in a self-contained framework.
πβ‘οΈπ½οΈ Example:
Besides Video to Audio (π½οΈ β‘οΈπ), we also support Audio to Video (πβ‘οΈπ½οΈ) generation under the same unified framework.
14.01.2025 18:13 β π 0 π 0 π¬ 1 π 0Compared to Meta Movie Gen Video to Audio, we achieve significantly better temporal synchronization with a 90% smaller scale model.
14.01.2025 18:13 β π 0 π 0 π¬ 1 π 0recise temporal synchronization remains a significant challenge for current video-to-audio models. AV-Link addresses this by leveraging diffusion features to accurately capture both local and global temporal events, such as hand slides on a guitar and fretboard pitch changes.
14.01.2025 18:13 β π 0 π 0 π¬ 1 π 0Can pretrained diffusion models be connected for cross-modal generation?
π’ Introducing AV-Link βΎοΈ
Bridging unimodal diffusion models in one self-contained framework to enable:
π½οΈ β‘οΈ π Video-to-Audio generation.
π β‘οΈ π½οΈ Audio-to-Video generation.
π: snap-research.github.io/AVLink/
β€΅οΈ Results
After x (aka good old twitter) kept shadow-banning me for no apparent reason, I decided to give Blue Sky a try. Posting this tweet to test my reach
10.01.2025 12:32 β π 0 π 0 π¬ 0 π 0