Joan Serrà's Avatar

Joan Serrà

@serrjoa.bsky.social

Does research on machine learning at Sony AI, Barcelona. Works on audio analysis, synthesis, and retrieval. Likes tennis, music, and wine. https://serrjoa.github.io/

307 Followers  |  151 Following  |  23 Posts  |  Joined: 15.11.2024  |  2.4833

Latest posts by serrjoa.bsky.social on Bluesky

I don't know. I could just now...

16.02.2025 09:41 — 👍 0    🔁 0    💬 0    📌 0

I think I may switch back to Twitter/X. Somehow I feel this site didn't take off and I really don't want to be looking at two feeds all the time...

08.01.2025 18:33 — 👍 3    🔁 0    💬 3    📌 0
ChatGPT and Image Matching – Wide baseline stereo meets deep learning Are we done yet?

Image matching and ChatGPT - new post in the wide baseline stereo blog.

tl;dr: it is good, even feels like human, but not perfect.
ducha-aiki.github.io/wide-baselin...

02.01.2025 21:01 — 👍 34    🔁 8    💬 2    📌 1

Many of the greatest papers, now canonical works, have a story of resistance, tension, and, finally, a crucial advocate. It's shockingly common. Why is there a bias against excellence? And what happens to those papers, those people, when no one has the courage to advocate?

28.12.2024 23:42 — 👍 12    🔁 2    💬 1    📌 0
Preview
Intern - Machine Learning for Audio The Music Technology team at Sony AI Barcelona is looking for Research Interns who are passionate about machine learning for audio signal processing. Our mission is to research and develop technologie...

Apply here: sonyglobal.wd1.myworkdayjobs.com/Sony_Europe_...

23.12.2024 08:13 — 👍 0    🔁 0    💬 0    📌 0

Preferred qualifications:
- PhD candidate or Postdoc.
- Experience with representation/contrastive learning or generative music models.
- Strong programming skills.
- Strong mathematical background.
- Python, github, pytorch, ...
- EU residence permit.
👇

23.12.2024 08:13 — 👍 1    🔁 0    💬 1    📌 0

Topics: representation learning for music matching or generative models for music copyright.
Location: Barcelona, on-site (two days a week at least).
Duration: 4-6 months.
Start date: April-November 2025.
Dedication: full-time (part-time also an option).
👇

23.12.2024 08:13 — 👍 1    🔁 0    💬 1    📌 0
Views from the office window. Photo taken just now.

Views from the office window. Photo taken just now.

Do you want to work with me for some months? Two internship positions available at the Music Team of Sony AI in Barcelona!
👇

23.12.2024 08:13 — 👍 11    🔁 4    💬 1    📌 0

Haha, me maybe not, but someone should go...

21.12.2024 13:36 — 👍 0    🔁 0    💬 0    📌 0

Thanks.

21.12.2024 13:36 — 👍 0    🔁 0    💬 0    📌 0

Congrats to my colleagues, many of whom are not on this website!

21.12.2024 08:29 — 👍 1    🔁 0    💬 0    📌 0

I'm happy to have two papers accepted at #ICASSP2025!

1) Contrastive learning for audio-video sequences, exploiting the fact that they are *sequences*: arxiv.org/abs/2407.05782

2) Knowledge distillation at *pre-training* time to help generative speech enhancement: arxiv.org/abs/2409.09357

21.12.2024 08:29 — 👍 15    🔁 0    💬 2    📌 1

Flow matching mapping text to image directly (instead of noise to image): cross-flow.github.io

20.12.2024 18:29 — 👍 4    🔁 0    💬 0    📌 0
Post image

With some delay, JetFormer's *prequel* paper is finally out on arXiv: a radically simple ViT-based normalizing flow (NF) model that achieves SOTA results in its class.

Jet is one of the key components of JetFormer, deserving a standalone report. Let's unpack: 🧵⬇️

20.12.2024 14:39 — 👍 42    🔁 7    💬 2    📌 1
YouTube Share your videos with friends, family, and the world

Did you miss any of the talks of the Deep Learning Barcelona Symposyum 2024 ? Play them now from the recorded stream:

www.youtube.com/live/yPc-Un3...

19.12.2024 23:32 — 👍 4    🔁 1    💬 0    📌 1
Post image

I'll get straight to the point.

We trained 2 new models. Like BERT, but modern. ModernBERT.

Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.

It's much faster, more accurate, longer context, and more useful. 🧵

19.12.2024 16:45 — 👍 621    🔁 148    💬 19    📌 34

On pre-acrivation norm, learnable residuals, etc.

19.12.2024 06:34 — 👍 1    🔁 0    💬 0    📌 0

Two great tokenizer blog posts that helped me over the years: sjmielke.com/papers/token...

sjmielke.com/comparing-pe...

People have mostly standardized on certain tokenizations right now, but there are huge performance gaps between locales with high agglomeration (e.g. common en-us) and ...

18.12.2024 13:39 — 👍 10    🔁 2    💬 1    📌 0

No.

16.12.2024 12:42 — 👍 4    🔁 0    💬 0    📌 0

Don't be like Reviewer 2.

15.12.2024 08:17 — 👍 3    🔁 0    💬 0    📌 0
Post image

Did Gauss invent the Gaussian?

- Laplace wrote down the integral first in 1783
- Gauss then described it in 1809 in the context of least-sq. for astronomical measurements
- Pearson & Fisher framed it as ‘normal’ density only in 1910

* Best part is: Gauss gave Laplace credit!

14.12.2024 06:22 — 👍 38    🔁 5    💬 0    📌 0

I already signed up (as a mentor) for this year!

13.12.2024 14:01 — 👍 1    🔁 0    💬 0    📌 0
NeurIPS Poster Improving Deep Learning Optimization through Constrained Parameter RegularizationNeurIPS 2024

Thrilled to present our work on Constrained Parameter Regularization (CPR) at #NeurIPS2024!
Our novel deep learning regularization outperforms weight decay across various tasks. neurips.cc/virtual/2024...
This is joint work with Michael Hefenbrock, Gregor Köhler, and Frank Hutter
🧵👇

09.12.2024 15:28 — 👍 2    🔁 1    💬 1    📌 0
Video thumbnail

Entropy is one of those formulas that many of us learn, swallow whole, and even use regularly without really understanding.

(E.g., where does that “log” come from? Are there other possible formulas?)

Yet there's an intuitive & almost inevitable way to arrive at this expression.

09.12.2024 22:44 — 👍 546    🔁 129    💬 22    📌 12
Post image

Inventors of flow matching have released a comprehensive guide going over the math & code of flow matching!

Also covers variants like non-Euclidean & discrete flow matching.

A PyTorch library is also released with this guide!

This looks like a very good read! 🔥

arxiv: arxiv.org/abs/2412.06264

10.12.2024 08:35 — 👍 109    🔁 26    💬 1    📌 1
Preview
Normalizing Flows are Capable Generative Models Normalizing Flows (NFs) are likelihood-based models for continuous inputs. They have demonstrated promising results on both density estimation and generative modeling tasks, but have received relati...

Normalizing Flows are Capable Generative Models

Apple introduces TarFlow, a new Transformer-based variant of Masked Autoregressive Flows.

SOTA on likelihood estimation for images, quality and diversity comparable to diffusion models.

arxiv.org/abs/2412.06329

10.12.2024 08:06 — 👍 54    🔁 9    💬 1    📌 1
Post image

That was fast: #DLBCN 2024 was sold out in less than two hours !

New requests will be added to a waiting list. Read the instructions for same day event registration:

sites.google.com/view/dlbcn20...

09.12.2024 09:08 — 👍 3    🔁 1    💬 0    📌 0
Post image

Past work has characterized the functions learned by neural networks: arxiv.org/pdf/1910.01635, arxiv.org/abs/1902.05040, arxiv.org/abs/2109.12960, arxiv.org/abs/2105.03361. But it turns out multi-task training produces strikingly different solutions! Adding tasks produces “kernel-like” solutions.

07.12.2024 21:49 — 👍 75    🔁 11    💬 1    📌 0
NeurIPS Poster Rule Extrapolation in Language Modeling: A Study of Compositional Generalization on OOD PromptsNeurIPS 2024

Can language models transcend the limitations of training data?

We train LMs on a formal grammar, then prompt them OUTSIDE of this grammar. We find that LMs often extrapolate logical rules and apply them OOD, too. Proof of a useful inductive bias.

Check it out at NeurIPS:

nips.cc/virtual/2024...

06.12.2024 13:31 — 👍 114    🔁 8    💬 7    📌 1

Of course, Jürgen invented it before.

06.12.2024 07:09 — 👍 0    🔁 0    💬 0    📌 0

@serrjoa is following 19 prominent accounts