Bill Psomas's Avatar

Bill Psomas

@billpsomas.bsky.social

MSCA Postdoctoral Fellow @ Visual Recognition Group, CTU in Prague. Deep Learning for Computer Vision. Former IARAI, Inria, Athena RC intern. Photographer. Crossfit freak. πŸ“Prague, CZ. πŸ”— http://users.ntua.gr/psomasbill/

537 Followers  |  207 Following  |  49 Posts  |  Joined: 21.11.2024  |  1.8704

Latest posts by billpsomas.bsky.social on Bluesky


Post image

Sleeping while waiting on an β€œanywhere in the world” paper decision release. #CVPR2026

20.02.2026 21:21 β€” πŸ‘ 17    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
Preview
Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency As fine-tuning becomes impractical at scale, probing is emerging as the preferred evaluation protocol. However, standard linear probing can understate the capability of models whose pre-training optim...

8/8 Resources πŸ“„

Paper: arxiv.org/abs/2506.10178
Code: github.com/billpsomas/e...

Joint work with: Dionysis Christopoulos,@eirinibaltzi.bsky.social,@ikakogeorgiou.bsky.social, @tim-arav.bsky.social,Nikos Komodakis,Konstantinos Karantzalos,Yannis Avrithis,@gtolias.bsky.social.

See you @ ICLR 2026πŸ‡§πŸ‡·

20.02.2026 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

7/n Take-home messages πŸ’‘

EP:
- Plug-and-play.
- Compatible with all pre-training families.
- Unlocks the potential of encoders optimized for local representations.
- Complementary with PEFT.
- Better to have it, than not to have it. πŸ‘€

20.02.2026 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

6/n EP + PEFT = πŸ”₯

- EP captures information that LoRA alone does not, and vice versa.
- LoRA+EP improves over both pure EP and pure LoRA.

πŸ“Œ Example: a LoRA+EP configuration with 250K params reaches 72%, 4.3% above linear probing (67.7%), while using over 3Γ— fewer parameters.

20.02.2026 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

5/n Interpretability πŸ”

- EP queries specialize in distinct spatial regions.
- Attention maps are complementary.
- Semantic correspondences emerge (e.g. tails, feet).
- Verified quantitatively too.

20.02.2026 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

4/n Designed for local representations🧩

πŸ“Š Across ImageNet-1K:

- Consistent gains over k-NN and Linear Probing (LP).
- Particularly strong improvements for MIM, VL, and generative.
- Minimal overhead.

20.02.2026 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

3/n Core observation βš™οΈ

Prior attentive probing uses redundant projections.

πŸ” Introducing Efficient Probing (EP):

πŸ“Œ Multi-query cross-attention.
πŸ”Œ Plug-and-play on top of frozen encoders.
πŸ’Έ Lightweight and parameter-efficient.

20.02.2026 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

2/n Why revisit probing? πŸ€”

- Linear probing underestimates encoders optimizing local representations.
- Full fine-tuning is costly at scale.
- Attentive probing helps, yet methods are over-parametrized and not well-studied.

πŸ‘‰ Can we get attention benefits without that much overhead?

20.02.2026 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

1/n Attention, Please! πŸš€

Our work β€œRevisiting Attentive Probing Through the Lens of Efficiency” has been accepted at #ICLR2026.

We introduce Efficient Probing (EP) β€” a lightweight, multi-query attentive probing method for frozen encoders.

Paper + code at the end πŸ‘‡

20.02.2026 15:03 β€” πŸ‘ 11    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1
Post image Post image Post image Post image

BBoxMaskPose v2: Expanding Mutual Conditioning to 3D

Miroslav Purkrabek, Constantin Kolomiiets, Jiri Matas

tl;dr: sota in human pose estimations, especially for the hard cases
arxiv.org/abs/2601.15200

03.02.2026 10:49 β€” πŸ‘ 12    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Would love to try

13.01.2026 18:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Best promo anyone could make for this position πŸ‘πŸΎπŸ° And, amazingly, everything said is true πŸŽ†

09.01.2026 05:36 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Postdoctoral research position in Instance-level visual generation Czech Technical University in Prague (CTU) offers a fellowship program, the CTU Global Postdoc Fellowship. This new and attractive two-year fellowship-program offers excellent researchers who have rec...

I have an opening for a two years post-doc position on instance-level (personalized) visual generation. Eligibility: (i) <=7 years from Ph.D. (ii) studies or 1 year outside of Czechia (ii) >=3 journal with IF or CORE A*/A conference papers. Deadline: 15 Feb.
Details: www.euraxess.cz/jobs/399390

08.01.2026 11:11 β€” πŸ‘ 12    πŸ” 10    πŸ’¬ 2    πŸ“Œ 1
Post image

πŸš€New task: Instance-level Image+Textβ†’Image Retrieval

πŸ”ŽGiven a query image + an edit (β€œduring night”), retrieve the same specific instance after the change β€” not just any similar object.

πŸ›’New dataset on HF: i-CIR huggingface.co/datasets/bil...

πŸ”₯Download, run, and share results!

06.01.2026 20:00 β€” πŸ‘ 12    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0

12/12 Joint work with Giorgos Petsangourakis, Christos Sgouropoulos, Theodoros Giannakopoulos, Giorgos Sfikas, @ikakogeorgiou.bsky.social.

27.12.2025 10:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion Latent diffusion models (LDMs) achieve state-of-the-art image synthesis, yet their reconstruction-style denoising objective provides only indirect semantic supervision: high-level semantics emerge slo...

11/n Summary🏁

REGLUE shows that the way we leverage VFM semantics matters for diffusion. Combining compact local semantics with global context yields faster convergence and state-of-the-art image generation.

πŸ“„arXiv: arxiv.org/abs/2512.16636
πŸ’»Project: reglueyourlatents.github.io

27.12.2025 10:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

10/n Faster convergenceπŸ”₯

REGLUE (SiT-B/2) achieves 12.9 and 28.7 FID at 400K iterations in conditional and unconditional generation, respectively, outperforming REPA, ReDi, and REG. REGLUE (SiT-XL/2) matches 1M-step SOTA performance in just 700k iterations (~30% fewer steps).

27.12.2025 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

9/n Alignment effects βš“

External alignment complements joint modeling, but its benefits depend on the signal. Local alignment yields consistent gains, whereas global-only alignment can degrade performance. Spatial joint modeling remains the primary driver.

27.12.2025 10:29 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

8/n Local > Global Semantics🧩

Our analysis shows that jointly modeling with patch-level semantics drives most gains. The global [CLS] helps, but fine-grained spatial features deliver a strongly larger FID improvement, highlighting the importance of local structure for diffusion.

27.12.2025 10:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

7/n Semantic preservation under compressionπŸ“‰

Do compressed patch features retain VFM semantics?

Points show frozen compressed DINOv2 semantics (x: ImageNet top-1 / Cityscapes mIoU) vs SiT-B generation quality (y: ImageNet FID) when trained on VAE latents + compressed features.

27.12.2025 10:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

6/n Non-linear compression matters πŸ’Ž

Linear PCA can limit patch-level semantics (e.g., ReDi). We introduce a lightweight non-linear semantic compressor that aggregates multi-layer VFM features into a compact, semantics-preserving space, boosting quality (21.4 β†’ 13.3 FID).

27.12.2025 10:28 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

5/n Our method 🧠

REGLUE puts these into one unified model and jointly models:

1️⃣ VAE latents (pixels)
2️⃣ local semantics (compressed patch features)
3️⃣ global [CLS] (concept)
βž• alignment loss as a complementary auxiliary boost.

27.12.2025 10:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

4/n Main insight πŸ’‘

Jointly modeling compressed patch-level semantics βž• VAE latents provides spatial guidance and yields larger gains than alignment-only (REPA) or global-only (REG).

Alignment loss and a global [CLS] token stay complementary, orthogonal signals.

27.12.2025 10:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

3/n Key design choice 🧩 Compact spatial semantics matter!

To leverage VFMs effectively, diffusion should jointly model VAE latents with multi-layer VFM spatial (patch-level) semantics, via a compact, non-linearly compressed representation.

27.12.2025 10:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

2/n More semantics are needed! βž•

Existing joint modeling and external alignment approaches (e.g., REPA, REG) inject only a β€œnarrow slice” of VFM features into diffusion. We argue richer semantics are needed to unlock their full potential.

27.12.2025 10:26 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

1/n REGLUE Your Latents! πŸš€

We introduce REGLUE: a unified framework that entangles VAE latents βž• Global βž• Local semantics for faster, higher-fidelity image generation.

Links (paper + code) at the endπŸ‘‡

27.12.2025 10:26 β€” πŸ‘ 14    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Post image

✨This NASA/ESA/CSA #Webb Space Telescope image shows the Westerlund 1 cluster, a group of massive stars located in our galaxy which are nearly as bright as a million Suns each. πŸŒŒπŸŽ„πŸ§ͺπŸ”­

πŸ”— esa.int/ESA_Multimedia/Images/2024/10/The_exotic_stellar_population_of_Westerlund_1

24.12.2025 11:11 β€” πŸ‘ 416    πŸ” 113    πŸ’¬ 8    πŸ“Œ 17

Evaluate your policy on REALM - our new real-to-sim validated benchmark for generalization in robot manipulation!

REALM is a good real-world proxy, evidenced by the high correlation with real-world evals.

ALSO: Follow the rising star @msedlacek.bsky.social and ask him any questions about REALM!

24.12.2025 11:37 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
billpsomas/icir Β· Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

πŸ“£ i-CIR dataset (NeurIPS 25) is now on
@hf.co.

πŸš€Easier download + better discoverability + WebDataset shards for large-scale use (~750K images).

πŸ€— Grab it here: huggingface.co/datasets/bil...

#computervision #retrieval #datasets #huggingface #NeurIPS

20.12.2025 18:42 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Yejin Choi on whether small companies or Labs can compete with big ones on AI and LLMS. Neurips 2025 invited talk.

04.12.2025 16:38 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

@billpsomas is following 20 prominent accounts