8/Large thanks for the vital support: @bifold.berlin, Hector Fellow Academy, @tuberlin.bsky.social, @tuebingen-ai.bsky.social, @helmholtzmunich.bsky.social, @munichcenterml.bsky.social, and Aignostics. We couldn't have done it without this ecosystemπ
19.01.2026 09:44 β
π 4
π 0
π¬ 0
π 0
7/ Huge thanks to my shared first author, Marco Morik, and all great co-authors: @lukasthede.bsky.social , @lucaeyring.bsky.social , Shinichi Nakajima, @zeynepakata.bsky.social , and @lukasmut.bsky.social π
19.01.2026 09:44 β
π 3
π 0
π¬ 1
π 0
5/ Best part? It's parameter-efficient and keeps the backbone frozen! βοΈ
19.01.2026 09:44 β
π 2
π 0
π¬ 1
π 0
4/ Interpretation: Attention heatmaps reveal that specialized tasks (medical, satellite) rely heavily on intermediate layers, while natural images favor later layers.
19.01.2026 09:44 β
π 3
π 0
π¬ 1
π 0
3/ The impact:
β
Consistent gains across 20 datasets.
β
+5.54 pp avg. improvement over standard linear probes.
β
Works across model scales (Small to Large) and training objectives (CLIP, DINOv2, Supervised)
19.01.2026 09:44 β
π 4
π 0
π¬ 1
π 0
2/ How it works: A cross-attention mechanism dynamically weights and fuses [CLS] and Average-Pooled [AP] tokens from ALL layers. It automatically identifies the most relevant abstraction levels for your task.
19.01.2026 09:44 β
π 4
π 0
π¬ 2
π 0
1/ Task-relevant info is distributed across the entire hierarchy, not just the final layer. We propose Attentive Multi-Layer Fusion to unlock this potential.
19.01.2026 09:44 β
π 4
π 1
π¬ 1
π 0
Why you should probe more than just the final layer of your Vision Transformer to maximize performance. π§΅π
19.01.2026 09:44 β
π 16
π 5
π¬ 1
π 2
π Presenting at #ICML2025 tomorrow!
Come and explore how representational similarities behave across datasets :)
π
Thu Jul 17, 11 AM-1:30 PM PDT
π East Exhibition Hall A-B #E-2510
Huge thanks to @lorenzlinhardt.bsky.social, Marco Morik, Jonas Dippel, Simon Kornblith, and @lukasmut.bsky.social!
16.07.2025 21:07 β
π 9
π 3
π¬ 0
π 0
Objective drives the consistency of representational similarity across datasets
The Platonic Representation Hypothesis claims that recent foundation models are converging to a shared representation space as a function of their downstream task performance, irrespective of the obje...
I am deeply grateful to @lorenzlinhardt.bsky.social, Marco Morik, Jonas Dippel, Simon Kornblith, and @lukasmut.bsky.social for their great work and support in this project! We also thank our collaborators, @bifold.berlin and HFA 7/7
πPaper: arxiv.org/abs/2411.05561
π»Code: github.com/lciernik/sim...
06.06.2025 14:14 β
π 6
π 3
π¬ 0
π 0
2nd key insight: The link between model similarity & behavior varies by dataset. Single-domain sets show strong correlations, while some multi-domain ones have high-performing, dissimilar models. Thus, the Platonic Representation Hypothesis may depend on the dataset's nature. π§΅ 6/7
06.06.2025 14:14 β
π 2
π 0
π¬ 1
π 0
Key finding: Training objective is a crucial factor for similarity consistency! SSL models show remarkably consistent representations across stimulus sets compared to image-text and supervised models, which show high variance in their consistency due to dataset dependence. π§΅ 5/7
06.06.2025 14:14 β
π 7
π 0
π¬ 1
π 1
Thus, we suggest a framework to systematically study if relative representational similarities between models remain consistent. We measure similarities between sets of models with different traits and their correlation across dataset pairs to assess stability across stimuli. π§΅4/7
06.06.2025 14:14 β
π 2
π 0
π¬ 1
π 0
Representational similarity using linear CKA. Left to right: natural multi- and single-domain, and specialized datasets, followed by mean and standard deviation across all datasets. Models (rows and columns) are ordered by a hierarchical clustering of the mean matrix. Yellow and white boxes highlight regions with more stable similarity patterns across datasets, corresponding to some image-text (yellow) and self-supervised model pairs (white), while cyan boxes show higher variability for mainly supervised model pairs.
First finding: Representational similarities do not transfer directly across datasets, showing high variability across datasets, such as different ranges and patterns. π§΅ 3/7
06.06.2025 14:14 β
π 3
π 0
π¬ 1
π 0
The Platonic Rep. Hypothesis @phillipisola.bsky.social et al. suggests foundation models converge to a shared representation space. Yet, most studies consider single datasets when measuring representational similarity. Thus, we were wondering: Does this convergence hold more broadly? π§΅ 2/7
06.06.2025 14:14 β
π 3
π 0
π¬ 1
π 0
If two models are more similar to each other than a third on ImageNet, will this hold for medical/satellite images?
Our #icml2025 paper analyses how vision model similarities generalize across datasets, the factors that influence them, and their link to downstream task behavior. π§΅1/7
06.06.2025 14:14 β
π 24
π 4
π¬ 1
π 3
π History repeats itself: We investigated how early modern communities have embraced scholarly advancements, reshaping scientific views and exploring scientific roots amidst a changing world.
www.science.org/doi/10.1126/...
@mpiwg.bsky.social @tuberlin.bsky.social @bifold.berlin @science.org
27.12.2024 09:20 β
π 15
π 3
π¬ 1
π 2
π’If you are interested in single-cell foundation models (scFMs), stop by our poster (West 109) at the AiDrugX Workshop at Neurips 2024. We will present CancerFoundation, a scFM tailored for studying cancer biologyπ§¬.
Preprint: biorxiv.org/content/10.1...
15.12.2024 19:38 β
π 8
π 2
π¬ 1
π 1
π New preprint from our lab, Ekaterina Krymova, and @fabiantheis.bsky.social: UniversalEPI, an attention-based method to predict enhancer-promoter interactions from DNA sequence and ATAC-seqπ Read the full preprint: www.biorxiv.org/content/10.1... by @aayushgrover.bsky.social, L. Zhang & I.L. Ibarra
26.11.2024 13:40 β
π 51
π 13
π¬ 1
π 0