Dhruv Batra's Avatar

Dhruv Batra

@dhruvbatra.bsky.social

Co-founder & Chief Scientist at Yutori. Prev: Senior Director leading FAIR Embodied AI at Meta, and Professor at Georgia Tech.

395 Followers  |  93 Following  |  37 Posts  |  Joined: 22.11.2024  |  2.2599

Latest posts by dhruvbatra.bsky.social on Bluesky

Post image Post image

Solved: robustness to paraphrasing and false premises, OCR, world-knowledge based reasoning.

Open: spatial reasoning, data-efficiency, learning compatible representations.

23.10.2025 17:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

As part of the award ceremony, VQA team presented a recap of vision-and-language research over the last decade โ€” solved problems, progress, and open-challenges for mutimodal LLMs.

23.10.2025 17:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Lots to be done. Thank you to all our collaborators and the research community for this recognition!

21.10.2025 19:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Fun-fact: the T-shirt I'm wearing is an inside joke about the quality of 2015 models.

However, every few years we rediscover the lesson that on difficult tasks, VLMs silently regress to being nearly blind.

x.com/DhruvBatra_/...

21.10.2025 19:27 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image Post image

VQA challenge series won the Mark Everingham prize at #ICCV2025 for stimulating a new strand of vision-and-language research.

It's extra special because ICCV25 marks the 10-year anniversary of the VQA paper.

When we started, the idea of answering any question about any image seemed outlandish.

21.10.2025 19:27 โ€” ๐Ÿ‘ 12    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Anything by Ted Chiang

20.10.2025 03:52 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I dunno man, Dagger is cool.

20.10.2025 03:51 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The problem with โ€œAI slopโ€ isnโ€™t the AI โ€” itโ€™s the slop.

People act like AI is the issue, when itโ€™s actually part of the fix.

If we're honest: most of what we make, most of the time, is slop by our own standards.

Thatโ€™s the generatorโ€“discriminator gap in creative work that Ira Glass talks about.

15.10.2025 16:22 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Somebody is a fan of Abundance

10.06.2025 05:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

It is so refreshing to see conferences innovate on the reviewing model and run actual experiments (!) as opposed to fighting change.

16.04.2025 04:43 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Good. Autonomous interface locomotion is the fundamental robotics problem of our time. The more the merrier.

01.04.2025 17:12 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

My entire robotics career has led to this.

01.04.2025 16:05 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The answer to many "why X?" questions:

Because the laws of physics do not prohibit X and the forces of biology gave us curiosity.

28.03.2025 15:43 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Yutori Weโ€™re building AI agents that can reliably do everyday digital tasks for you on the web, towards an AI chief-of-staff for everyone.

The web is the ultimate boss-level for agents โ€” dynamic, non-deterministic, and noisy; some mistakes are inevitable and so far, every agent fails eventually.

Yutori is building superhuman agents for this ultimate digital environment.

Join our waitlist for early access to our product!

yutori.com

27.03.2025 14:31 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

๐ˆ๐ฆ๐š๐ ๐ข๐ง๐ž ๐š ๐ฐ๐จ๐ซ๐ฅ๐ ๐ฐ๐ก๐ž๐ซ๐ž ๐ง๐จ ๐ก๐ฎ๐ฆ๐š๐ง ๐ก๐š๐ฌ ๐ญ๐จ ๐๐ข๐ซ๐ž๐œ๐ญ๐ฅ๐ฒ ๐ข๐ง๐ญ๐ž๐ซ๐š๐œ๐ญ ๐ฐ๐ข๐ญ๐ก ๐ญ๐ก๐ž ๐ฐ๐ž๐› ๐š๐ ๐š๐ข๐ง.

Where teams of AI assistants coordinate to book flights, manage budgets, or file paperworkโ€”proactively surfacing insights and correcting errors.

Only problem โ€” no one knows how to build AI agents that actually work.

27.03.2025 14:31 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

I started something new last year with a wonderful group of people. We showed a demo in Jan.

Today, weโ€™re telling our story โ€” show before you talk!

๐˜ž๐˜ฆ ๐˜ข๐˜ณ๐˜ฆ ๐˜ณ๐˜ฆ-๐˜ช๐˜ฎ๐˜ข๐˜จ๐˜ช๐˜ฏ๐˜ช๐˜ฏ๐˜จ ๐˜ฉ๐˜ฐ๐˜ธ ๐˜ฑ๐˜ฆ๐˜ฐ๐˜ฑ๐˜ญ๐˜ฆ ๐˜ช๐˜ฏ๐˜ต๐˜ฆ๐˜ณ๐˜ข๐˜ค๐˜ต ๐˜ธ๐˜ช๐˜ต๐˜ฉ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ธ๐˜ฆ๐˜ฃ โ€” one of humanityโ€™s greatest inventions and a a mess overdue for an overhaul.

yutori.com

27.03.2025 14:31 โ€” ๐Ÿ‘ 10    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Ah, understood. No idea about the tracing of that meme.

23.03.2025 15:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Seems like the ultimate thing to rally around, no? To the extent there is any purpose, what's the alternative?

23.03.2025 02:28 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I'm already there for low-stakes queries.

23.03.2025 01:02 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Where's the skepticism coming from? Now that web search and citations are in there, isn't it easy to verify and thus become more confident?

23.03.2025 00:59 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿ“ขExcited to announce our upcoming workshop - Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models (VLMs-4-All) @CVPR 2025!
๐ŸŒ sites.google.com/view/vlms4all

14.03.2025 15:55 โ€” ๐Ÿ‘ 17    ๐Ÿ” 11    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 4
Post image

Using a locally-running LLM to translate a review is explicitly prohibited by @iccv.bsky.social

Why? Whom does this possibly harm?

06.03.2025 18:10 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The way it's always been done isn't handling the current scale well (as evidenced by the feedback from the authors). Yes, outsource to a company, pay for creation of new tools, start new companies, all of the standard ways of addressing a growing market.

26.02.2025 15:52 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

Why is it volunteer work? Why doesn't an organization that takes in millions in sponsorship professionalize?

26.02.2025 15:46 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Some of us did :)

21.12.2024 16:51 โ€” ๐Ÿ‘ 9    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

It's not just about how accurate the laws are, but also how robust their predictions are under uncertainty.

Int physics operates directly from pixels without knowing precise masses, coefficients of friction, restitution, etc. Physics engines make heavy demands and "explode" when things are off.

15.12.2024 17:39 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Agreed on that comparison.

But one likely learn more about intuitive physics from watching billiard balls collide than by reading the wiki page.

Text is likely more information rich on average. My point is that we are not running out of other sources of information for learning about the world.

15.12.2024 04:30 โ€” ๐Ÿ‘ 9    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Fair, but text is not all of intelligence

14.12.2024 22:35 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

If it works, it's a good solution :)

14.12.2024 22:34 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image


Brilliant talk by Ilya, but he's wrong on one point.

We are NOT running out of data. We are running out of human-written text.

We have more videos than we know what to do with. We just haven't solved pre-training in vision.

Just go out and sense the world. Data is easy.

14.12.2024 19:15 โ€” ๐Ÿ‘ 99    ๐Ÿ” 15    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 3

@dhruvbatra is following 20 prominent accounts