Ashkan Mirzaei's Avatar

Ashkan Mirzaei

@ashmrz.bsky.social

Research Scientist @Snap Previously @UofT, @NVIDIAAI, @samsungresearch Opinions are mine. http://ashmrz.github.io

91 Followers  |  58 Following  |  16 Posts  |  Joined: 15.11.2024  |  2.7485

Latest posts by ashmrz.bsky.social on Bluesky

Super cool work Masha, congrats!

07.08.2025 22:14 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I'll be at SIGGRAPH 2025 in Vancouver (Aug 9 - 15)! If you're around and up for some good coffee and/or chats about all things content creation, hit me up β˜•πŸŽ¨ #SIGGRAPH2025

05.08.2025 16:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Congrats, Kosta! That sounds incredible. Wishing you an amazing year ahead full of great people, new ideas, and exciting experiences.

Have you already taken off or still around for a bit?

01.07.2025 22:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Super insightful!

25.06.2025 13:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

[9/9] 4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

🌐 Project page: snap-research.github.io/4Real-Video-V2
πŸ“œ Abstract: arxiv.org/abs/2506.18839

24.06.2025 14:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

[8/9] Authors: Chaoyang Wang*, Ashkan Mirzaei*, Vidit Goel, Willi Menapace, Aliaksandr Siarohin, Avalon Vinella, Michael Vasilkovsky, Ivan Skorokhodov, Vladislav Shakhrai, Sergey Korolev, Sergey Tulyakov, Peter Wonka.
*equal contribution

24.06.2025 14:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

[7/9] 🧠 We use a camera token replacement trick for temporal consistency of the camera poses, temporal attention layers to share info over time, and a "Gaussian head" to predict shape, scale, opacity, and color offsets.

24.06.2025 14:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[6/9] πŸ” How it works – Stage 2 (Reconstruction):
Our feedforward model takes RGB frames and predicts camera poses and dynamic 3D Gaussians. No optimization loops. No ground-truth poses. Just fast, clean reconstruction.

24.06.2025 14:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

[5/9] ⚑ The architecture runs on a DiT backbone. Thanks to sparse attention and temporal compression, we keep things efficient. Only self-attention layers are fine-tuned and everything else is frozen.

24.06.2025 14:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

[4/9]🧠How it works – Stage 1 (Generation):
We fuse spatial/temporal attentions into a transformer layer. This view-time attention lets our diffusion model reason across viewpoints and frames jointly, without extra parameters. Parameter-efficiency also leads to more stability.

24.06.2025 14:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[3/9] High-quality 4D training data is scarce, and large video models are expensive to fine-tune. So we focus on parameter efficiency. Our fused attention design reuses pretrained weights with minimal changes. It trains fast, generalizes well, and scales to full 4D scenes.

24.06.2025 14:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

[2/9] We generate synchronized multi-view video grids, then lift them into 4D geometry using a fast feedforward network. The result is a set of Gaussian particles, ready for rendering, exploration, and editing.

24.06.2025 14:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

[1/9] πŸš€ We introduce 4Real-Video-V2, a method that can generate 4D scenes from a simple text prompt, viewable from any angle at any moment in time. It’s fast, photorealistic, and works on full scenes. Here's how it works and why it matters. πŸ‘‡

snap-research.github.io/4Real-Video-...

24.06.2025 14:13 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In Germany, there is a tradition of creating funny hats for doctoral graduates. πŸŽ“ @cvoelcker.bsky.social brought this tradition to my group and, together with Umangi Jain, spearheaded the construction of a masterpiece for our first PhD graduate, @ashmrz.bsky.social. 1/2

28.05.2025 17:40 β€” πŸ‘ 12    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Congrats on your election victory, @liberalca.bsky.social and @mark-carney.bsky.social! If I had one wish for Canada's new government, it would be this: stop delaying visas and make it easier for the world's top scientists, including research-focused graduate students, to come to Canada. 🍁

30.04.2025 19:20 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
Snap Inc. Application Manager

πŸš€ Have a strong background in 3D/4D Generative Models? Consider applying for an internship with us at Snap’s Creative Vision team! 🎨✨

snap.submittable.com/submit

10.03.2025 13:21 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Calling all PhD students in vision/ML interested in working with a great team (+me!) at GDM doing cutting edge research in 3D computer vision and generative models!

05.03.2025 22:12 β€” πŸ‘ 26    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0
Preview
EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering We introduce a method for using event camera data in novel view synthesis via Gaussian Splatting. Event cameras offer exceptional temporal resolution and a high dynamic range. Leveraging these capabil...

πŸ“Ή EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering
Toshiya Yura, @ashmrz.bsky.social, Igor Gilitschenski 5/🧡
arxiv.org/abs/2412.07293

03.03.2025 19:47 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

πŸš€ Tired of waiting for your Gaussian-based scenes to fit dynamic inputs? ⏳ Wait no more! Check out our new paper and discover an instant, feed-forward approach! 🎯✨

07.12.2024 17:56 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Huge congrats KostaπŸŽ‰

25.11.2024 19:09 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@ashmrz is following 20 prominent accounts