Vineeth Yeevani @vyeevani - Bluesky Profile

YouTube video by Tesla This Robot Sucks

Is anyone else baffled by this? Why build this if you have the Optimus robot? Based on the teleoperation + current behavior cloning paradigm, it should be exceedingly trivial to learn the repetitive sequence of motions to handle cleaning this car. What am i missing?

m.youtube.com/watch?v=vVFX...

12.02.2025 16:33 — 👍 1 🔁 0 💬 0 📌 0

FSD not great in SF.
1. Doesn’t follow taxi + bus lanes
2. Doesn’t understand streets have a lane that’s sometimes parking sometimes driving
3. Always messes up left turns onto van ness
4. Always have to intervene for Valencia/market intersection

10.02.2025 22:03 — 👍 1 🔁 0 💬 0 📌 0

Also, the technique makes robotics more accessible while trading understandability. People find that useful in other tools (llm, gen image models). This makes me think this will be useful also.

10.02.2025 21:25 — 👍 1 🔁 0 💬 0 📌 0

DexGen

I was wrong about this. My change of heart was inspired by dexgen (zhaohengyin.github.io/dexteritygen/). They use BC for course motion, RL for fine grained. Instant policy could just do both in a single shot. Instant policy would be great for tele-op

10.02.2025 21:18 — 👍 0 🔁 0 💬 1 📌 0

Deepseek r1 proves Yann LeCun’s thesis that models need physical grounding
Deepseek: good prior + easy verification = 🔥.
Good prior = video
easy verification = robotics in physical world

09.02.2025 23:33 — 👍 0 🔁 0 💬 0 📌 0

If time travel existed, wouldn’t we have met future us by now? Maybe it’s a beacon—you can only travel back to when it was first turned on. No paradox, just a switch we haven’t flipped yet.

09.02.2025 18:43 — 👍 0 🔁 0 💬 0 📌 0

Goal: vlm capable of planning + action
Problem: no data
Solution:
Bootstrap intelligence from vlm
1. Start with off the shelf vlm
2. Collect rollouts from code as policy from vlm for a set of tasks
3. GRPO over rollouts
4. Goto 2
5. Offline RL over vlm for direct obs->act

01.02.2025 22:06 — 👍 1 🔁 0 💬 0 📌 0

Let’s revisit code as policy in the deepseek-r1 era

01.02.2025 21:55 — 👍 1 🔁 0 💬 1 📌 0

Why is instant policy better than using point cloud matching/alignment approaches? Can we not just sprinkle genAI on everything without making a clear cut case for why scaling a method with DL is better than classic approach?

30.01.2025 20:36 — 👍 1 🔁 0 💬 1 📌 0

This btw is just regular old RL + curriculum learning. However, unlike control RL where we learn on the test set, here we have diff envs from train and test time.

30.01.2025 18:32 — 👍 0 🔁 0 💬 0 📌 0

Recipe.
1. Start with weak base + problems that range from really simple to really hard
3. Sort problems by how well model does on them
4. Pick 75% problems model can do, 25% it can’t. RL with GRPO. Use 10x rollouts + high temp on 25%
5. Repeat step 3 till all problems solved

30.01.2025 18:30 — 👍 0 🔁 0 💬 1 📌 0

Been seeing “competence threshold” take hold as an idea. High level: for a model to benefit from RL, it has to have some minimal competence. My view is, competence threshold is a function of the number of rollouts per question. Larger the number of rollouts, less competent the base model has to be.

30.01.2025 18:18 — 👍 1 🔁 0 💬 1 📌 0

Rerun is the best ml logger I’ve used. Miles better than wandb and tensorboard.

29.12.2024 19:05 — 👍 0 🔁 0 💬 0 📌 0

Turns out problem was too low resolution. Going higher res fixed the problem.

26.12.2024 22:43 — 👍 1 🔁 0 💬 0 📌 0

On second thought, rectified flow doesn’t help with this problem. Need to look for something different

26.12.2024 04:25 — 👍 0 🔁 0 💬 1 📌 0

The samples look like the mean of the input. Suspicion that the similar examples result in a nullcline in velocity when t -> 1. Rectified flow should fix this. But, I hate the multi stage process of reflow.

25.12.2024 21:36 — 👍 1 🔁 0 💬 1 📌 0

Anyone know how to improve sample diversity with flow matching when examples are really similar?

25.12.2024 20:37 — 👍 1 🔁 0 💬 1 📌 0

Why write clean code for ml? Took me ten min to convert a video diffusion training library to flow matching.

23.12.2024 18:39 — 👍 2 🔁 0 💬 0 📌 0

Pre-LN transformer layers don’t work for perceivers with fixed encodings

22.12.2024 16:57 — 👍 1 🔁 0 💬 0 📌 0

Declaring the death of model scaling is premature. Regardless of whether model scaling will continue, industry leaders’ flip flopping on this issue shows the folly of trusting their forecasts. They are not significantly better informed than the rest of us, and their narratives are heavily influenced by their vested interests. Inference scaling is real, and there is a lot of low-hanging fruit, which could lead to rapid capability increases in the short term. But in general, capability improvements from inference scaling will likely be both unpredictable and unevenly distributed among domains. The connection between capability improvements and AI’s social or economic impacts is extremely weak. The bottlenecks for impact are the pace of product development and the rate of adoption, not AI capabilities.

New AI Snake Oil essay: Last month the AI industry's narrative suddenly flipped — model scaling is dead, but "inference scaling" is taking over. This has left people outside AI confused. What changed? Is AI capability progress slowing? We look at the evidence. 🧵 www.aisnakeoil.com/p/is-ai-prog...

19.12.2024 12:16 — 👍 119 🔁 37 💬 3 📌 9

Keeping functionality in modular libraries helps everyone doesn’t have to rewrite everything. Least cohesion of modules is a great rule of thumb.

18.12.2024 21:11 — 👍 1 🔁 0 💬 0 📌 0

Next scale prediction demonstrates how to explicitly simulate CNNs using transformers.

16.12.2024 03:03 — 👍 0 🔁 0 💬 0 📌 0

That’s the best board I’ve ever see

14.12.2024 02:11 — 👍 1 🔁 0 💬 1 📌 0

A self driving car dataset has examples going left around a tree and right around a tree. The algorithm averages the two and goes straight into the tree. Hallucination.

05.12.2024 01:01 — 👍 1 🔁 0 💬 0 📌 0

The reward signal here would be if the user accepts some task completion or not. You’re fine tuning to improve the success rate of the agents taking into account the multi-turn decision making nature of agents.

30.11.2024 01:06 — 👍 3 🔁 0 💬 0 📌 0

I agree that you can do a lot with the foundation. But you could always do more. Offline RL techniques can make this happen. A good number of them make the mdp assumption.

30.11.2024 01:04 — 👍 1 🔁 0 💬 1 📌 0

My assumption here is that most ppl working on this are fine tuning to make agents. They have some reward dataset. Under POMDP, they could use a PPO library that works well in sequential decision making. With this, you avoid writing any alg code.

29.11.2024 19:17 — 👍 0 🔁 0 💬 1 📌 0

Engineering/product driven. Many existing decision making algorithms are based on POMDPs. It’s faster to use existing libraries to sketch promising agent use cases.

29.11.2024 18:36 — 👍 1 🔁 0 💬 1 📌 0

GitHub - vyeevani/easy-transformer-masks: Really flexible masks for transformers Really flexible masks for transformers. Contribute to vyeevani/easy-transformer-masks development by creating an account on GitHub.

It’s hard to make transformer masks for nested sequences - essential when working with perceivers or robotics. Wrote a library to help github.com/vyeevani/eas...

29.11.2024 18:06 — 👍 1 🔁 0 💬 0 📌 0

They still have a decision making casual structure - the future can’t influence the past.

29.11.2024 17:25 — 👍 2 🔁 0 💬 1 📌 0

Vineeth Yeevani

Latest posts by vyeevani.bsky.social on Bluesky

@vyeevani is following 20 prominent accounts