Moayed Haji ALi's Avatar

Moayed Haji ALi

@moayedha.bsky.social

Phd @RiceUniversity | Research Intern @Snap

5 Followers  |  9 Following  |  8 Posts  |  Joined: 10.01.2025  |  1.4883

Latest posts by moayedha.bsky.social on Bluesky

ICLR rejections go brrrr

22.01.2025 16:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Check this recent work by my PhD student Moayed. He has been doing amazing work on Generative AI for images, video and audio. We introduce AV-Link ♾️, an unified approach for audio-video generation. Our generated audio is the best in terms of synchronization with video actions. Check thread below.

14.01.2025 18:23 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation We propose AV-Link, a unified framework for Video-to-Audio and Audio-to-Video generation that leverages the activations of frozen video and audio diffusion models for temporally-aligned cross-modal co...

A great collaboration with
W. Menapace, A. Siarohin, I. Skorokhodov, A. Canberk, K.S Lee, V. Ordonez, and S. Tulyakov.

Please repost to support our work and check out our
Arxiv preprint: arxiv.org/abs/2412.15191
Webpage: snap-research.github.io/AVLink/

14.01.2025 18:13 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

While current approaches uses external pretrained features (e.g. Meta CLIP, BEATs), we found that diffusion activations hold rich, semantically and temporally aware features, making them perfect for cross-modal generation in a self-contained framework.

πŸ”Šβž‘οΈπŸ“½οΈ Example:

14.01.2025 18:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Besides Video to Audio (πŸ“½οΈ βž‘οΈπŸ”Š), we also support Audio to Video (πŸ”Šβž‘οΈπŸ“½οΈ) generation under the same unified framework.

14.01.2025 18:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Compared to Meta Movie Gen Video to Audio, we achieve significantly better temporal synchronization with a 90% smaller scale model.

14.01.2025 18:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

recise temporal synchronization remains a significant challenge for current video-to-audio models. AV-Link addresses this by leveraging diffusion features to accurately capture both local and global temporal events, such as hand slides on a guitar and fretboard pitch changes.

14.01.2025 18:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Can pretrained diffusion models be connected for cross-modal generation?

πŸ“’ Introducing AV-Link ♾️

Bridging unimodal diffusion models in one self-contained framework to enable:
πŸ“½οΈ ➑️ πŸ”Š Video-to-Audio generation.
πŸ”Š ➑️ πŸ“½οΈ Audio-to-Video generation.

🌐: snap-research.github.io/AVLink/

‡️ Results

14.01.2025 18:13 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1

After x (aka good old twitter) kept shadow-banning me for no apparent reason, I decided to give Blue Sky a try. Posting this tweet to test my reach

10.01.2025 12:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@moayedha is following 9 prominent accounts