Check out the paper & code for all the details!
π Paper: arxiv.org/abs/2502.04959
π» Code: github.com/danielm1405/...
Huge thanks to my amazing collaborators:
Simone Magistri, Sebastian Cygert, BartΕomiej Twardowski, Andrew D. Bagdanov, Joost van de Weijer
10.02.2025 14:47 β π 1 π 0 π¬ 0 π 0
In summary: By using a uniform singular value spectrum π and task-specific subspaces π―, Iso-CTS achieves state-of-the-art performance across all settings!π₯
10.02.2025 14:47 β π 0 π 0 π¬ 1 π 0
π Thatβs why we propose replacing the least important components with task-specific vectors that are orthogonal to the common subspace.
This further enhances alignment π―, and the performance naturally improves! π
10.02.2025 14:47 β π 1 π 0 π¬ 1 π 0
This simple modification boosts task arithmetic by π 10-15% across all model merging scenarios, achieving state-of-the-art results in most cases!π₯
However, we found that the bottom components contribute very little to the final performanceβ¦ πβ οΈ
10.02.2025 14:47 β π 1 π 0 π¬ 1 π 0
Based on this, we propose an isotropic merging framework that:
π Flattens the singular value spectrum of task matrices
π― Enhances alignment between tasks
βοΈ Reduces the perf gap
Surprisingly, the best performance is achieved when the singular value spectrum is uniform!π
10.02.2025 14:47 β π 3 π 0 π¬ 1 π 0
We show that alignment between singular components of task-specific & merged matrices strongly correlates with performance gains over the pre-trained model! π
π Tasks that are well-aligned get amplified π, while less aligned ones become underrepresented and struggle. π¬π
10.02.2025 14:47 β π 1 π 0 π¬ 1 π 0
π What happens when you modify the spectrum of singular values of the merged task vector? π€
Apparently, you achieve π¨state-of-the-artπ¨ model merging results! π₯
β¨ Introducing βNo Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspacesβ
10.02.2025 14:47 β π 6 π 4 π¬ 1 π 0
Self-supervised Learning with Masked Autoencoders (MAE) is known to produce worse image representations than Joint-Embedding approaches (e.g. DINO). In our new paper, we identify new reasons for why that is and point towards solutions: arxiv.org/abs/2412.03215 π§΅
05.12.2024 19:56 β π 14 π 4 π¬ 1 π 0
CEO of Bluesky, steward of AT Protocol.
dec/acc π± πͺ΄ π³
@PyTorch "My learning style is Horace twitter threads" -
@typedfemale
Apple ο£Ώ ML Research in Barcelona, prev OxCSML InfAtEd, part of MLinPL & polonium_org π΅π±, sometimes funny
Strengthening the Eastern European ML community and improving diversity in the field.
facebook.com/EEMLcommunity
My opinions only here.
π¨βπ¬ RS DeepMind
Past:
π¨βπ¬ R Midjourney 1y π§βπ DPhil AIMS Uni of Oxford 4.5y
π§ββοΈ RE DeepMind 1y πΊ SWE Google 3y π TUM
π€ @nwspk
Postdoctoral Researcher @ Bethgelab, University of TΓΌbingen
Benchmarking | LLM Agents | Data-Centric ML | Continual Learning | Unlearning
drimpossible.github.io
FunAI Lab at the University of Technology Nuremberg
led by Prof. Yuki M. Asano
website: https://fundamentalailab.github.io/
ML researcher @ Warsaw University of Technology
Reinforcement learning / Neural Networks Plasticity / Neural Network Representations / AI4Science
Research Scientist @GoogleDeepMind. Representation learning for multimodal understanding and generation.
mitscha.github.io
Building & Evaluating LLMs at the University of Edinburgh
Generative AI Lab Fellow (GAIL, gail.ed.ac.uk)
mklimasz.github.io
Working on something new, combining active, multimodal, and memory-augmented learning.
Formerly Senior Staff Scientist @GoogleDeepMind, PhD @NYU, @Polytechnique
Prof (CS @Stanford), Co-Director @StanfordHAI, Cofounder/CEO @theworldlabs, CoFounder @ai4allorg #AI #computervision #robotics #AI-healthcare
Postdoc at IDEAS NCBR and Warsaw University of Technology
International Conference on Learning Representations https://iclr.cc/
[bridged from https://blog.neurips.cc/ on the web: https://fed.brid.gy/web/blog.neurips.cc ]
PhD student in Machine Learning @Warsaw University of Technology and @IDEAS NCBR
Research Scientist at @Apple. Previous: @Meta (FAIR), @Inria, @MSFTResearch, @VectorInst and @UofG . Egyptian πͺπ¬