Joel Valdivia Ortega virtualhomo

📢 Thanks a lot to @lorenzlamm.bsky.social, @marionjasnin.bsky.social, @tingyingpeng.bsky.social, Franziska Eckardt and Benedikt Schworm for all the help in this work ❤️ and to you for reading 😍 !

I would love to get any feedback, so please feel free to reach out!

🧵9/9

03.12.2025 02:18 — 👍 2 🔁 0 💬 0 📌 0

🤟 Results

Thus, increasing the token's norm or having non-sparse representations become more costly for the ViT, promoting less repurposing, better representations and, as a result, better quantitative performance.

🧵8/9

03.12.2025 02:18 — 👍 0 🔁 0 💬 1 📌 0

Random embeddings also create angular anisotropy: the less sparse the token is, the more distortion it will experience from the random embedding.

🧵7/9

03.12.2025 02:18 — 👍 1 🔁 0 💬 1 📌 0

Random embeddings create a radial anisotropy: the bigger the token's norm, the more it will be distorted by the random embedding.

🧵6/9

03.12.2025 02:18 — 👍 1 🔁 0 💬 1 📌 0

☝️Topology preservation

Choosing appropriate amplitudes, we can preserve the topology of the latent space. The smaller the amplitude, the less modifications are expected in the topology.

🧵5/9

03.12.2025 02:18 — 👍 1 🔁 0 💬 1 📌 0

🧑‍🍳 Contribution

We proposed replacing the learnable parameters from the MLPs by random variables turning them into random embeddings, creating the ✨Randomized-MLP (RMLP)✨. This architecture has a hyperparameter, amplitude, to control the standard deviation of the variables.

🧵4/9

03.12.2025 02:18 — 👍 1 🔁 0 💬 1 📌 0

🎨 Baseline

Taking DINO and iBOT losses, let's consider the teacher providing two stable classes. The student and its MLP then need to learn how to match that classification, where part of the learning might go to the MLP.

🧵3/9

03.12.2025 02:18 — 👍 1 🔁 0 💬 1 📌 0

✨Motivation

Token's norm has been used to spot ViTs repurposing patch tokens to encode general information on void regions on natural images and regularisation techniques have been developed to avoid this. We saw this behaviour on regularised models when applied to medical images.

🧵2/9

03.12.2025 02:18 — 👍 1 🔁 0 💬 1 📌 0

Attention maps and PCA visualisations comparing mode

In case you're missing my poster at #NeurIPS2025 about how I fine-tuned DINOv2 to ophthalmological images, here are some animations so you don't miss out!

🔗 Preprint: doi.org/10.48550/arX...
🔗 Code: github.com/peng-lab/rmlp
🧵1/9

03.12.2025 02:18 — 👍 1 🔁 1 💬 1 📌 1

Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2 Vision Transformers (ViTs), such as DINOv2, achieve strong performance across domains but often repurpose low-informative patch tokens in ways that reduce the interpretability of attention and feature...

I wrote a paper with @lorenzlamm.bsky.social, @marionjasnin.bsky.social, @tingyingpeng.bsky.social, F. Eckardt and B. Schworm and got accepted at #NeurIPS2025 and fun fact: 90% of viewers are enby vegans! 🤟

So if u fit there, u might wanna check it out! Maybe also if you don't. We're allies here <3

12.11.2025 12:28 — 👍 2 🔁 2 💬 0 📌 0

Apparently it's "blue hair, pronouns and how many times u've read my paper" now

02.11.2025 15:45 — 👍 0 🔁 0 💬 0 📌 0

I hate gay Halloween, what do you mean you’re a virtual homo?

01.11.2025 15:49 — 👍 2 🔁 0 💬 0 📌 0

Posts by Joel Valdivia Ortega (@virtualhomo.bsky.social)