Yash Bhalgat's Avatar

Yash Bhalgat

@ysbhalgat.bsky.social

PhD at VGG, Oxford w/ Andrew Zisserman, Andrea Vedaldi, Joao Henriques, Iro Laina. Past: Senior RS Qualcomm #AI #Research, UMich, IIT Bombay. I occasionally post AI memes. yashbhalgat.github.io

344 Followers  |  214 Following  |  56 Posts  |  Joined: 03.11.2023
Posts Following

Posts by Yash Bhalgat (@ysbhalgat.bsky.social)

Post image

Excited to announce the 1st Workshop on 3D-LLM/VLA at #CVPR2025! πŸš€ @cvprconference.bsky.social

Topics: 3D-VLA models, LLM agents for 3D scene understanding, Robotic control with language.

πŸ“’ Call for papers: Deadline – April 20, 2025

🌐 Details: 3d-llm-vla.github.io

#llm #3d #Robotics #ai

23.03.2025 21:35 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Our beginner's oriented accessible introduction to modern deep RL is now published in Foundations and Trends in Optimization. It is a great entry to the field if you want to jumpstart into RL!
@bernhard-jaeger.bsky.social
www.nowpublishers.com/article/Deta...
arxiv.org/abs/2312.08365

22.02.2025 19:32 β€” πŸ‘ 62    πŸ” 14    πŸ’¬ 2    πŸ“Œ 0
SOCIAL MEDIA TITLE TAG SOCIAL MEDIA DESCRIPTION TAG TAG

I think a few things will happen soon:
πŸš€ Scale beyond 8B
🎯 Multi-modal capabilities
⚑️Faster inference
πŸ”„ Reinforcement learning integration

Exciting to see alternatives to autoregressive models succeeding at scale!

Paper: ml-gsai.github.io/LLaDA-demo/

(8/8)

18.02.2025 15:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Results vs LLaMA3 8B:

- Matches/exceeds on most tasks
- Better at math & Chinese tasks
- Strong in-context learning
- Improved dialogue capabilities

(7/8) 🧡

18.02.2025 15:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

A major result: LLaDA breaks the "reversal curse" that plagues autoregressive models. πŸ”„

On tasks requiring bidirectional reasoning, it outperforms GPT-4 and maintains consistent performance in both forward/reverse directions.

(6/8) 🧡

18.02.2025 15:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

For generation, they introduce clever remasking strategies:

- Low-confidence remasking: Remask tokens the model is least sure about

- Semi-autoregressive: Generate in blocks left-to-right while maintaining bidirectional context

(5/8) 🧡

18.02.2025 15:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Training uses random masking ratio t ∈ [0,1] for each sequence.

The model learns to predict original tokens given partially masked sequences. No causal masking used.

Also enables instruction-conditioned generation with the same technique. No modifications.

(4/8) 🧡

18.02.2025 15:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ’‘Core insight: Generative modeling principles, not autoregression, give LLMs their power.

LLaDA's forward process gradually masks tokens while reverse process predicts them simultaneously. This enables bidirectional modeling.

(3/8) 🧡

18.02.2025 15:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Key highlights:
- Successful scaling of masked diffusion to LLM scale (8B params)
- Masking with variable ratios for forward/reverse process
- Smart remasking strategies for generation, incl. semi-autoregressive
- SOTA on reversal tasks, matching Llama 3 on others

(2/8) 🧡

18.02.2025 15:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

"LLaDA: Large Language Diffusion Models" Nie et al.

Just read this fascinating paper.

Scaled up Masked Diffusion Language Models to 8B params, and show that it can match #LLMs (including Llama 3) while solving some key limitations!

Let's dive in... 🧡

(1/8)

#genai

18.02.2025 15:05 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Light-A-VideoClick to Play and Loop VideoClick to Play and Loop VideoClick to Play and Loop VideoClick to Play and Loop VideoClick to Play and Loop VideoClick to Play and Loop Video

Project page: bujiazi.github.io/light-a-vide...
Code: github.com/bcmi/Light-A...

Could be a game-changer for quick video mood/lighting adjustments without complicated VFX pipelines! 🎬

16.02.2025 16:28 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The results are pretty good ✨
They can transform regular videos into moody noir scenes, add sunlight streaming through windows, or create cyberpunk neon vibes -- works on everything from portrait videos to car commercials! πŸš—

16.02.2025 16:28 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Technical highlights πŸ”:
- Consistent Light Attention (CLA) module for stable lighting across frames
- Progressive Light Fusion for smooth temporal transitions
- Works with ANY video diffusion model (AnimateDiff, CogVideoX)
- Zero-shot - no fine-tuning needed!

16.02.2025 16:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

New work introduces a training-free method to relight entire videos, while maintaining temporal consistency! πŸ“½οΈπŸŒ…

"Light-A-Video: Training-free Video Relighting via Progressive Light Fusion" Zhou et al.

(1/n) 🧡

#genai #ai #research #video

16.02.2025 16:26 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets

Project page: liuisabella.com/RigAnything/
Code: not available yet

Really excited to try this out once the code is available!

15.02.2025 13:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Authors claim that the model generalizes well across diverse shapes - from humanoids to marine creatures! And works with real-world images & arbitrary poses. 🀩

15.02.2025 13:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Technical highlights:
- BFS-ordered skeleton sequence representation
- Autoregressive joint prediction with diffusion sampling
- Hybrid attention masking: full self-attention for shape tokens, causal attention for skeleton
- e2e trainable pipeline without clustering/MST ops

15.02.2025 13:05 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Need to rig 3D models? πŸ¦–

New work from UCSD and Adobe:
"RigAnything: Template-Free Autoregressive Rigging
for Diverse 3D Assets" Liu et al.

tl;dr: reduces rigging time from 2 mins to 2 secs, works on any shape category & doesn't need predefined templates! πŸš€

15.02.2025 13:05 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Self attention: Merge Query matrix and Key matrix into a single covariance matrix? Β· rasbt LLMs-from-scratch Β· Discussion #517 When compute the context vector in the attention algorithm, three weight matrices were introduced. It has discussed in #454 that the value matrix W_V is not necessary. For the rest two, query matri...

@sebastianraschka.com this is such an interesting discussion! I haven't tried this myself, but I think this can be analyzed theoretically by looking at the rank of the attention matrix in both cases.

I have posted my thoughts on the discussion here: github.com/rasbt/LLMs-f...

14.02.2025 14:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Latent Radiance Fields with 3D-aware 2D Representations

Interesting how they handle the domain gap between 2D latent space and 3D representations through their three-stage pipeline. The correspondence-aware encoding significantly reduces high-frequency noise while preserving geometry.

Project: latent-radiance-field.github.io/LRF/

14.02.2025 10:29 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Technical approach:
- Correspondence-aware autoencoding to enhance 3D consistency in VAE latent space
- Builds 3D representations from 3D-aware 2D features
- VAE-Radiance Field alignment to bridge domain gap between latent and image space

#nerf #ai #research

14.02.2025 10:28 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

"Latent Radiance Fields with 3D-aware 2D Representations" Zhou et al., #ICLR2025

tl;dr: Novel framework that integrates 3D awareness into VAE latent space using correspondence-aware encoding, enabling high-quality rendered images with ~50% memory savings.

(1/n) 🧡

14.02.2025 10:28 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Project: research.nvidia.com/labs/dir/edg...
Training and inference code available here: github.com/NVlabs/EdgeR...

13.02.2025 22:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

The architecture uses a lightweight encoder and auto-regressive decoder to compress variable-length meshes into fixed-length codes, enabling point cloud and single-image conditioning.

Their ArAE model controls face count for varying detail while preserving mesh topology.

13.02.2025 22:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

"EdgeRunner" (#ICLR2025) from #Nvidia & PKU introduces an auto-regressive auto-encoder for mesh generation, supporting up to 4000 faces at 512³ resolution. 🀩

Their mesh tokenization algorithm (adapted from EdgeBreaker) achieves ~50% compression (4-5 tokens per face vs 9), making training efficient.

13.02.2025 22:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Technical highlight: They combine 3D latent diffusion with multi-view conditioning for the base shape, then use 2D normal maps for refinement. The results look way cleaner than previous methods.

12.02.2025 22:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Their two-stage approach: First generate coarse geometry (5s), then add fine details (20s) using normal maps based refinement. Smart way to balance speed and quality.

12.02.2025 22:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Just came across this fascinating paper "CraftsMan3D" - a practical approach to text/image-to-3D generation that mimics how artists actually work!

Code available (pretrained models too) 🀩: github.com/wyysf-98/Cra...

(1/n) 🧡

12.02.2025 22:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Got me excited for a second here 🫠

10.02.2025 12:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

So, what happened this week in #AI?

29.01.2025 12:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0