Anand Bhattad's Avatar

Anand Bhattad

@anandbhattad.bsky.social

Incoming Assistant Professor at Johns Hopkins University | RAP at Toyota Technological Institute at Chicago | web: https://anandbhattad.github.io/ | Knowledge in Generative Image Models, Intrinsic Images, Image-based Relighting, Inverse Graphics

196 Followers  |  194 Following  |  56 Posts  |  Joined: 04.12.2024  |  2.3103

Latest posts by anandbhattad.bsky.social on Bluesky

Thanks Andreas and the Scholar Inbox team! This is by far the best paper recommendation system I’ve come across. No more digging through overwhelming volumes and like the blog says, the right papers just show up in my inbox.

30.06.2025 14:47 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Scholar Inbox: Daily Research Recommendations just for You Science is moving fast. How can we keep up? Scholar Inbox helps researchers stay ahead by making the discovery of open access papers more personal.

On our blog: Science is moving fast. How do we keep up? #ScholarInbox, developed by the Autonomous Vision Group led by @andreasgeiger.bsky.social, helps researchers stay ahead - by making the discovery of #openaccess papers smarter and more personal: www.machinelearningforscience.de/en/scholar-i...

30.06.2025 12:40 β€” πŸ‘ 27    πŸ” 14    πŸ’¬ 1    πŸ“Œ 6

All slides from the #cvpr2025 (@cvprconference.bsky.social ) workshop "How to Stand Out in the Crowd?" are now available on our website:
sites.google.com/view/standou...

30.06.2025 03:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This is probably one of the best talks and slides I have ever seen. I was lucky to see this live! Great talk again :)

23.06.2025 19:24 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

A special shout-out to all the job-market candidates this year: it’s been tough with interviews canceled and hiring freezesπŸ™

After UIUC's blue and @tticconnect.bsky.social blue, I’m delighted to add another shade of blue to my journey at Hopkins @jhucompsci.bsky.social. Super excited!!

02.06.2025 19:46 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Anand Bhattad - Research Assistant Professor

We will be recruiting PhD students, postdocs, and interns. Updates soon on my website: anandbhattad.github.io

Also, feel free to chat with me @cvprconference.bsky.social #CVPR2025

I’m immensely grateful to my mentors, friends, colleagues, and family for their unwavering support.πŸ™

02.06.2025 19:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

At JHU, I'll be starting a new lab: 3P Vision Group. The β€œ3Ps” are Pixels, Perception & Physics.

The lab will focus on 3 broad themes:

1) GLOW: Generative Learning Of Worlds
2) LUMA: Learning, Understanding, & Modeling of Appearances
3) PULSE: Physical Understanding and Learning of Scene Events

02.06.2025 19:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

I’m thrilled to share that I will be joining Johns Hopkins University’s Department of Computer Science (@jhucompsci.bsky.social, @hopkinsdsai.bsky.social) as an Assistant Professor this fall.

02.06.2025 19:46 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1
Post image Post image Post image Post image

FastMap: Revisiting Dense and Scalable Structure from Motion

Jiahao Li, Haochen Wang, @zubair-irshad.bsky.social, @ivasl.bsky.social, Matthew R. Walter, Vitor Campagnolo Guizilini, Greg Shakhnarovich

tl;dr: replace BA with epipolar error+IRLS; fully PyTorch implementation

arxiv.org/abs/2505.04612

08.05.2025 12:51 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

[2/2] However, if we treat 3D as a real task, such as building a usable environment, then these projective geometry details matter. It also ties nicely to Ross Girshick’s talk at our RetroCV CVPR workshop last year, which you highlighted.

29.04.2025 16:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

[1/2] Thanks for the great talk and for sharing it online for those who couldn't attend 3DV. I liked your points on our "Shadows Don't Lie" paper. I agree that if the goal is simply to render 3D pixels, then subtle projective geometry errors that are imperceptible to humans are not a major concern.

29.04.2025 16:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Congratulations and welcome to TTIC! πŸ₯³πŸŽ‰

15.04.2025 13:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

By β€œremove,” I meant masking the object and using inpainting to hallucinate what could be there instead.

02.04.2025 05:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This is really cool work!

30.03.2025 00:14 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Thanks Noah! Glad you liked it :)

02.04.2025 04:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

[2/2] We also re-run the full pipeline *after each removal*. This matters: new objects can appear, occluded ones can become visible, etc, making the process adaptive and less ambiguous.

Fig above shows a single pass. Once the top bowl is gone, the next "top" bowl gets its own diverse semantics too

02.04.2025 04:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

[1/2] Not really... there's quite a bit of variation.

When we remove the top bowl, we get diverse semantics: fruits, plants, and other objects that just happen to fit the shape. As we go down, it becomes less diverse: occasional flowers, new bowls in the middle, & finally just bowls at the bottom.

02.04.2025 04:49 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Preview
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting Visual Jenga is a new scene understanding task where the goal is to remove objects one by one from a single image while keeping the rest of the scene stable. We introduce a simple baseline that uses a...

[10/10] This project began while I was visiting Berkeley last summer. Huge thanks to Alyosha for the mentorship and to my amazing co-author Konpat Preechakul. We hope this inspires you to think differently about what it means to understand a scene.

πŸ”— visualjenga.github.io
πŸ“„ arxiv.org/abs/2503.21770

29.03.2025 19:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

[9/10] Visual Jenga is a call to rethink what scene understanding should mean in 2025 and beyond.

We’re just getting started. There’s still a long way to go before models understand scenes like humans do. Our task is a small, playful, and rigorous step in that direction.

29.03.2025 19:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

[8/10] This simple idea surprisingly scales to a wide range of scenes: from clean setups like a cat on a table or a stack of bowls... to messy, real-world scenes (yes, even Alyosha’s office).

29.03.2025 19:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

[7/10] Why does this work? Because generative models have internalized asymmetries in the visual world.

Search for β€œcups” β†’ You’ll almost always see a table.
Search for β€œtables” β†’ You rarely see cups.

So: P(table | cup) ≫ P(cup | table)

We exploit this asymmetry to guide counterfactual inpainting

29.03.2025 19:36 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[6/10] We measure dependencies by masking each object, then using a large inpainting model to hallucinate what should be there. If the replacements are diverse, the object likely isn't critical. If it consistently reappears, like the table under the cat, it’s probably a support.

29.03.2025 19:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

[5/10] To solve Visual Jenga, we start with a surprising baseline without explicit physical reasoning & any 3D, simulation, or dynamics. Instead, we propose a training-free, generative approach that infers object removal order by exploiting statistical co-occurrence learned by generative models.

29.03.2025 19:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

[4/10] The goal of Visula Jenga is simple:
1) Remove one object at a time
2) Generate a sequence down to the background
3) Keep every intermediate scene physically & geometrically stable

29.03.2025 19:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

[3/10] Probing this understanding motivates our new task: Visual Jenga, a challenge beyond passive observation.

Like in the game of Jenga, success demands understanding structural dependencies. Which objects can you remove without collapsing the scene? That’s where true understanding begins.

29.03.2025 19:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

[2/10] Today’s models can name everything in an image.
But do they understand how a scene holds together?

Inspired by Biederman’s classic work on scene perception + influential efforts by Hoiem et al, Bottou et al, & others, we ask: Can a model understand support structure and object dependencies?

29.03.2025 19:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[1/10] Is scene understanding solved?

Models today can label pixels and detect objects with high accuracy. But does that mean they truly understand scenes?

Super excited to share our new paper and a new task in computer vision: Visual Jenga!

πŸ“„ arxiv.org/abs/2503.21770
πŸ”— visualjenga.github.io

29.03.2025 19:36 β€” πŸ‘ 58    πŸ” 14    πŸ’¬ 7    πŸ“Œ 1
Post image

I can’t believe this! Mind-blowing! There are small errors (a flipped logo, rotated chairs), but still, this is incredible!!

Xiaoyan, who’s been working with me on relighting, sent this over. It’s one of the hardest examples we’ve consistently used to stress-test LumiNet: luminet-relight.github.io

27.03.2025 20:00 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Check out UrbanIR - Inverse rendering of unbounded scenes from a single video!

It’s a super cool project led by the amazing Chih-Hao!

@chih-hao.bsky.social is a rising star in 3DV! Follow him!

Learn more hereπŸ‘‡

15.03.2025 13:49 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Can we create realistic renderings of urban scenes from a single video while enabling controllable editing: relighting, object compositing, and nighttime simulation?

Check out our #3DV2025 UrbanIR paper, led by @chih-hao.bsky.social that does exactly this.

πŸ”—: urbaninverserendering.github.io

16.03.2025 03:39 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

@anandbhattad is following 20 prominent accounts