Gabriele Goletto gabrigole - Bluesky Statics

Preprint now on ArXiv 📢
The N-Body Problem: Parallel Execution from Single-Person Egocentric Video
Input: Single-person egocentric video 👤
Out: imagine how these tasks can be performed faster by N > 1 people, correctly e.g. N=2 👥
📎 arxiv.org/abs/2512.11393
👀 zhifanzhu.github.io/ego-nbody/
1/4

15.12.2025 14:31 — 👍 7 🔁 6 💬 1 📌 1

Yes please! The animations look really clear to me so it would be a great learning resource with voiceover 🙏

09.05.2025 09:12 — 👍 2 🔁 0 💬 1 📌 0

Now on ArXiv our
@cvprconference.bsky.social
#CVPR2025 paper
Learning from Streaming Video with Orthogonal Gradients
Instead of shuffling clips, can we learn from videos fed sequentially, where you see a clip once, in order?
How to deal with the correlation of gradients over training?
1/3

10.04.2025 15:04 — 👍 17 🔁 2 💬 1 📌 0

But I like the (almost) bot-free conversations and there are some really good active accounts!

08.04.2025 05:32 — 👍 1 🔁 0 💬 0 📌 0

Check out Kosta’s starter packs (go.bsky.app/M7HGC3Y), that’s the fastest route. That said, unfortunately, the CV community here has become less active compared to a few months ago.

08.04.2025 05:27 — 👍 1 🔁 0 💬 1 📌 0

Image segmentation doesn’t have to be rocket science. 🚀
Why build a rocket engine full of bolted-on subsystems when one elegant unit does the job? 💡
That’s what we did for segmentation.
✅ Meet the Encoder-only Mask Transformer (EoMT): tue-mps.github.io/eomt (CVPR 2025)
(1/6)

31.03.2025 20:35 — 👍 8 🔁 4 💬 1 📌 1

Excited to release the first worldwide aerial image localization method (and demo!)
Take an aerial or satellite image from anywhere in the world, and AstroLoc can (probably) find its location, and provide a precise footprint!
Links to paper, demo and full-length (5 min) video ⬇️

14.02.2025 10:32 — 👍 9 🔁 1 💬 1 📌 0

🛑📢
HD-EPIC: A Highly-Detailed Egocentric Video Dataset
hd-epic.github.io
arxiv.org/abs/2502.04144
New collected videos
263 annotations/min: recipe, nutrition, actions, sounds, 3D object movement &fixture associations, masks.
26K VQA benchmark to challenge current VLMs
1/N

07.02.2025 11:45 — 👍 33 🔁 6 💬 2 📌 4

Now on ArXiv
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
arxiv.org/abs/2412.01987
soczech.github.io/showhowto/
Given one real image &variable sequence of text instructions, ShowHowTo generates a multi-step sequence of images *conditioned on the scene in the REAL image*
🧵

05.12.2024 15:01 — 👍 18 🔁 3 💬 1 📌 1

Hi Kosta, would love to be on this list as well 😊 I am working on egocentric video understanding

21.11.2024 10:21 — 👍 1 🔁 0 💬 0 📌 0

Posts by Gabriele Goletto (@gabrigole.bsky.social)