Excited to announce the 1st Workshop on 3D-LLM/VLA at #CVPR2025! π @cvprconference.bsky.social
Topics: 3D-VLA models, LLM agents for 3D scene understanding, Robotic control with language.
π’ Call for papers: Deadline β April 20, 2025
π Details: 3d-llm-vla.github.io
#llm #3d #Robotics #ai
23.03.2025 21:35 β
π 6
π 1
π¬ 0
π 0
Our beginner's oriented accessible introduction to modern deep RL is now published in Foundations and Trends in Optimization. It is a great entry to the field if you want to jumpstart into RL!
@bernhard-jaeger.bsky.social
www.nowpublishers.com/article/Deta...
arxiv.org/abs/2312.08365
22.02.2025 19:32 β
π 62
π 14
π¬ 2
π 0
SOCIAL MEDIA TITLE TAG
SOCIAL MEDIA DESCRIPTION TAG TAG
I think a few things will happen soon:
π Scale beyond 8B
π― Multi-modal capabilities
β‘οΈFaster inference
π Reinforcement learning integration
Exciting to see alternatives to autoregressive models succeeding at scale!
Paper: ml-gsai.github.io/LLaDA-demo/
(8/8)
18.02.2025 15:08 β
π 0
π 0
π¬ 0
π 0
Results vs LLaMA3 8B:
- Matches/exceeds on most tasks
- Better at math & Chinese tasks
- Strong in-context learning
- Improved dialogue capabilities
(7/8) π§΅
18.02.2025 15:07 β
π 0
π 0
π¬ 1
π 0
A major result: LLaDA breaks the "reversal curse" that plagues autoregressive models. π
On tasks requiring bidirectional reasoning, it outperforms GPT-4 and maintains consistent performance in both forward/reverse directions.
(6/8) π§΅
18.02.2025 15:07 β
π 0
π 0
π¬ 1
π 0
For generation, they introduce clever remasking strategies:
- Low-confidence remasking: Remask tokens the model is least sure about
- Semi-autoregressive: Generate in blocks left-to-right while maintaining bidirectional context
(5/8) π§΅
18.02.2025 15:07 β
π 0
π 0
π¬ 1
π 0
Training uses random masking ratio t β [0,1] for each sequence.
The model learns to predict original tokens given partially masked sequences. No causal masking used.
Also enables instruction-conditioned generation with the same technique. No modifications.
(4/8) π§΅
18.02.2025 15:06 β
π 0
π 0
π¬ 1
π 0
π‘Core insight: Generative modeling principles, not autoregression, give LLMs their power.
LLaDA's forward process gradually masks tokens while reverse process predicts them simultaneously. This enables bidirectional modeling.
(3/8) π§΅
18.02.2025 15:06 β
π 1
π 0
π¬ 1
π 0
Key highlights:
- Successful scaling of masked diffusion to LLM scale (8B params)
- Masking with variable ratios for forward/reverse process
- Smart remasking strategies for generation, incl. semi-autoregressive
- SOTA on reversal tasks, matching Llama 3 on others
(2/8) π§΅
18.02.2025 15:05 β
π 0
π 0
π¬ 1
π 0
"LLaDA: Large Language Diffusion Models" Nie et al.
Just read this fascinating paper.
Scaled up Masked Diffusion Language Models to 8B params, and show that it can match #LLMs (including Llama 3) while solving some key limitations!
Let's dive in... π§΅
(1/8)
#genai
18.02.2025 15:05 β
π 1
π 1
π¬ 1
π 0
The results are pretty good β¨
They can transform regular videos into moody noir scenes, add sunlight streaming through windows, or create cyberpunk neon vibes -- works on everything from portrait videos to car commercials! π
16.02.2025 16:28 β
π 0
π 0
π¬ 1
π 0
Technical highlights π:
- Consistent Light Attention (CLA) module for stable lighting across frames
- Progressive Light Fusion for smooth temporal transitions
- Works with ANY video diffusion model (AnimateDiff, CogVideoX)
- Zero-shot - no fine-tuning needed!
16.02.2025 16:27 β
π 0
π 0
π¬ 1
π 0
New work introduces a training-free method to relight entire videos, while maintaining temporal consistency! π½οΈπ
"Light-A-Video: Training-free Video Relighting via Progressive Light Fusion" Zhou et al.
(1/n) π§΅
#genai #ai #research #video
16.02.2025 16:26 β
π 10
π 2
π¬ 1
π 0
RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets
Project page: liuisabella.com/RigAnything/
Code: not available yet
Really excited to try this out once the code is available!
15.02.2025 13:06 β
π 0
π 0
π¬ 0
π 0
Authors claim that the model generalizes well across diverse shapes - from humanoids to marine creatures! And works with real-world images & arbitrary poses. π€©
15.02.2025 13:06 β
π 0
π 0
π¬ 1
π 0
Technical highlights:
- BFS-ordered skeleton sequence representation
- Autoregressive joint prediction with diffusion sampling
- Hybrid attention masking: full self-attention for shape tokens, causal attention for skeleton
- e2e trainable pipeline without clustering/MST ops
15.02.2025 13:05 β
π 2
π 0
π¬ 1
π 0
Need to rig 3D models? π¦
New work from UCSD and Adobe:
"RigAnything: Template-Free Autoregressive Rigging
for Diverse 3D Assets" Liu et al.
tl;dr: reduces rigging time from 2 mins to 2 secs, works on any shape category & doesn't need predefined templates! π
15.02.2025 13:05 β
π 5
π 0
π¬ 1
π 0
Latent Radiance Fields with 3D-aware 2D Representations
Interesting how they handle the domain gap between 2D latent space and 3D representations through their three-stage pipeline. The correspondence-aware encoding significantly reduces high-frequency noise while preserving geometry.
Project: latent-radiance-field.github.io/LRF/
14.02.2025 10:29 β
π 1
π 0
π¬ 0
π 0
Technical approach:
- Correspondence-aware autoencoding to enhance 3D consistency in VAE latent space
- Builds 3D representations from 3D-aware 2D features
- VAE-Radiance Field alignment to bridge domain gap between latent and image space
#nerf #ai #research
14.02.2025 10:28 β
π 3
π 0
π¬ 1
π 0
"Latent Radiance Fields with 3D-aware 2D Representations" Zhou et al., #ICLR2025
tl;dr: Novel framework that integrates 3D awareness into VAE latent space using correspondence-aware encoding, enabling high-quality rendered images with ~50% memory savings.
(1/n) π§΅
14.02.2025 10:28 β
π 2
π 0
π¬ 1
π 0
Project: research.nvidia.com/labs/dir/edg...
Training and inference code available here: github.com/NVlabs/EdgeR...
13.02.2025 22:36 β
π 0
π 0
π¬ 0
π 0
The architecture uses a lightweight encoder and auto-regressive decoder to compress variable-length meshes into fixed-length codes, enabling point cloud and single-image conditioning.
Their ArAE model controls face count for varying detail while preserving mesh topology.
13.02.2025 22:36 β
π 0
π 0
π¬ 1
π 0
"EdgeRunner" (#ICLR2025) from #Nvidia & PKU introduces an auto-regressive auto-encoder for mesh generation, supporting up to 4000 faces at 512Β³ resolution. π€©
Their mesh tokenization algorithm (adapted from EdgeBreaker) achieves ~50% compression (4-5 tokens per face vs 9), making training efficient.
13.02.2025 22:34 β
π 0
π 0
π¬ 1
π 0
Technical highlight: They combine 3D latent diffusion with multi-view conditioning for the base shape, then use 2D normal maps for refinement. The results look way cleaner than previous methods.
12.02.2025 22:11 β
π 0
π 0
π¬ 0
π 0
Their two-stage approach: First generate coarse geometry (5s), then add fine details (20s) using normal maps based refinement. Smart way to balance speed and quality.
12.02.2025 22:10 β
π 0
π 0
π¬ 1
π 0
Just came across this fascinating paper "CraftsMan3D" - a practical approach to text/image-to-3D generation that mimics how artists actually work!
Code available (pretrained models too) π€©: github.com/wyysf-98/Cra...
(1/n) π§΅
12.02.2025 22:10 β
π 1
π 0
π¬ 1
π 0
Got me excited for a second here π«
10.02.2025 12:12 β
π 0
π 0
π¬ 0
π 0
So, what happened this week in #AI?
29.01.2025 12:57 β
π 1
π 0
π¬ 0
π 0