Daniel Marczak's Avatar

Daniel Marczak

@dmarczak.bsky.social

mostly trying to merge models | phd student @ warsaw university of technology & ideas

54 Followers  |  86 Following  |  7 Posts  |  Joined: 18.11.2024  |  1.6354

Latest posts by dmarczak.bsky.social on Bluesky

Check out the paper & code for all the details!
πŸ“ Paper: arxiv.org/abs/2502.04959
πŸ’» Code: github.com/danielm1405/...

Huge thanks to my amazing collaborators:
Simone Magistri, Sebastian Cygert, BartΕ‚omiej Twardowski, Andrew D. Bagdanov, Joost van de Weijer

10.02.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

In summary: By using a uniform singular value spectrum πŸ“Š and task-specific subspaces 🎯, Iso-CTS achieves state-of-the-art performance across all settings!πŸ”₯

10.02.2025 14:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ” That’s why we propose replacing the least important components with task-specific vectors that are orthogonal to the common subspace.

This further enhances alignment 🎯, and the performance naturally improves! πŸ“ˆ

10.02.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

This simple modification boosts task arithmetic by πŸ“ˆ 10-15% across all model merging scenarios, achieving state-of-the-art results in most cases!πŸ”₯

However, we found that the bottom components contribute very little to the final performance… πŸ“‰βš οΈ

10.02.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Based on this, we propose an isotropic merging framework that:
πŸ“Š Flattens the singular value spectrum of task matrices
🎯 Enhances alignment between tasks
βš–οΈ Reduces the perf gap
Surprisingly, the best performance is achieved when the singular value spectrum is uniform!πŸš€

10.02.2025 14:47 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We show that alignment between singular components of task-specific & merged matrices strongly correlates with performance gains over the pre-trained model! πŸ“ˆ

πŸ” Tasks that are well-aligned get amplified πŸ”Š, while less aligned ones become underrepresented and struggle. πŸ˜¬πŸ“‰

10.02.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸš€ What happens when you modify the spectrum of singular values of the merged task vector? πŸ€”

Apparently, you achieve 🚨state-of-the-art🚨 model merging results! πŸ”₯

✨ Introducing β€œNo Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces”

10.02.2025 14:47 β€” πŸ‘ 6    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Post image

Self-supervised Learning with Masked Autoencoders (MAE) is known to produce worse image representations than Joint-Embedding approaches (e.g. DINO). In our new paper, we identify new reasons for why that is and point towards solutions: arxiv.org/abs/2412.03215 🧡

05.12.2024 19:56 β€” πŸ‘ 14    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

@dmarczak is following 18 prominent accounts