π Read the full paper:
SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation
Now on arXiv β arxiv.org/abs/2505.21795
@gabtriv.bsky.social
PhD in Computer Vision
π Read the full paper:
SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation
Now on arXiv β arxiv.org/abs/2505.21795
πΉ 4/4 β Promptable segmentation in action SANSA reduces reliance on costly pixel-level masks by supporting point, box, and scribble prompts
πenabling fast, scalable annotation with minimal supervision.
See the qualitative results π
πΉ 3/4 β SANSA achieves state-of-the-art in few-shot segmentation. We outperform specialist and foundation-based methods across various benchmarks:
π +9.3% mIoU on LVIS-92i
β‘ 3Γ faster than prior works
π‘ Only 234M parameters (4-5x smaller than competitors)
πΉ2/4 β Unlocking semantic structure
SAM2 features are rich, but optimized for tracking.
π§ Insert bottleneck adapters into frozen SAM2
π These restructure feature space to disentangle semantics
π Result: features cluster semanticallyβeven for unseen classes (see PCAπ)
π As #CVPR2025 week kicks off, meet SANSA: Semantically AligNed Segment Anything 2
We turn SAM2 into a semantic few-shot segmenter:
π§ Unlocks latent semantics in frozen SAM2
βοΈ Supports any prompt: fast and scalable annotation
π¦ No extra encoders
π github.com/ClaudiaCutta...
#ICCV2025
11.05.2025 15:05 β π 4 π 1 π¬ 1 π 0I guess merging the events could also work π I wonder whether cricket players would be better at ComputerVision than CV researchers are at cricket, or viceversa
11.05.2025 15:47 β π 1 π 0 π¬ 0 π 0To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition
Davide Sferrazza, @berton-gabri.bsky.social Gabriele Trivigno, Carlo Masone
tl;dr: global descriptors nowadays are often better than local feature matching methods for simple datasets.
arxiv.org/abs/2504.06116
β¨ SAMWISE achieves state-of-the-art performance across multiple #RVOS benchmarksβwhile being the smallest model in RVOS! π― It also sets a new #SOTA in image-level referring #segmentation. With only 4.9M trainable parameters, it runs #online and requires no fine-tuning of SAM2 π
10.04.2025 18:11 β π 0 π 0 π¬ 0 π 0π Contributions:
πΉ Textual Prompts for SAM2: Early fusion of visual-text cues via a novel adapter
πΉ Temporal Modeling: Essential for video understanding, beyond frame-by-frame object tracking
πΉ Tracking Bias: Correcting tracking bias in SAM2 for text-aligned object discovery
π₯ Our paper SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation is accepted as a #Highlight at #CVPR2025! π
We make #SegmentAnything wiser, enabling it to understand textual promptsβtraining only 4.9M parameters! π§
π» Code, models & demo: github.com/ClaudiaCutta...
Why SAMWISE?π
To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition
Davide Sferrazza, @berton-gabri.bsky.social, @gabtriv.bsky.social, Carlo Masone
tl;dr:VPR datasets saturate;re-ranking not good;image matching->uncertainty->inlier counts->confidence
arxiv.org/abs/2504.06116
π Paper Release! π
Curious about image retrieval and contrastive learning? We present:
π "All You Need to Know About Training Image Retrieval Models"
π The most comprehensive retrieval benchmarkβthousands of experiments across 4 datasets, dozens of losses, batch sizes, LRs, data labeling, and more!
Trying to convince my bluesky feed to put me in the Computer Vision community. Right now I only see posts about the orange-haired president. @berton-gabri.bsky.social @gabrigole.bsky.social how did you do it?
07.04.2025 09:43 β π 0 π 0 π¬ 1 π 0Image segmentation doesnβt have to be rocket science. π
Why build a rocket engine full of bolted-on subsystems when one elegant unit does the job? π‘
Thatβs what we did for segmentation.
β
Meet the Encoder-only Mask Transformer (EoMT): tue-mps.github.io/eomt (CVPR 2025)
(1/6)
Went outside today and thought this would be perfect for my first #bluesky post
05.04.2025 08:39 β π 1 π 0 π¬ 0 π 0