Yes please! The animations look really clear to me so it would be a great learning resource with voiceover ๐
09.05.2025 09:12 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0@gabrigole.bsky.social
Research Scientist @ Microsoft. ๐จโ๐ป https://gabrielegoletto.github.io
Yes please! The animations look really clear to me so it would be a great learning resource with voiceover ๐
09.05.2025 09:12 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0Now on ArXiv our
@cvprconference.bsky.social
#CVPR2025 paper
Learning from Streaming Video with Orthogonal Gradients
Instead of shuffling clips, can we learn from videos fed sequentially, where you see a clip once, in order?
How to deal with the correlation of gradients over training?
1/3
But I like the (almost) bot-free conversations and there are some really good active accounts!
08.04.2025 05:32 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Check out Kostaโs starter packs (go.bsky.app/M7HGC3Y), thatโs the fastest route. That said, unfortunately, the CV community here has become less active compared to a few months ago.
08.04.2025 05:27 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Image segmentation doesnโt have to be rocket science. ๐
Why build a rocket engine full of bolted-on subsystems when one elegant unit does the job? ๐ก
Thatโs what we did for segmentation.
โ
Meet the Encoder-only Mask Transformer (EoMT): tue-mps.github.io/eomt (CVPR 2025)
(1/6)
Excited to release the first worldwide aerial image localization method (and demo!)
Take an aerial or satellite image from anywhere in the world, and AstroLoc can (probably) find its location, and provide a precise footprint!
Links to paper, demo and full-length (5 min) video โฌ๏ธ
๐๐ข
HD-EPIC: A Highly-Detailed Egocentric Video Dataset
hd-epic.github.io
arxiv.org/abs/2502.04144
New collected videos
263 annotations/min: recipe, nutrition, actions, sounds, 3D object movement &fixture associations, masks.
26K VQA benchmark to challenge current VLMs
1/N
Now on ArXiv
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
arxiv.org/abs/2412.01987
soczech.github.io/showhowto/
Given one real image &variable sequence of text instructions, ShowHowTo generates a multi-step sequence of images *conditioned on the scene in the REAL image*
๐งต
Hi Kosta, would love to be on this list as well ๐ I am working on egocentric video understanding
21.11.2024 10:21 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0