Ben Hayes's Avatar

Ben Hayes

@ben-hayes.bsky.social

Machine learning for audio synthesis @ Sony CSL Paris PhD @ C4DM, QMUL. Former intern at Spotify, Sony CSL, Bytedance

225 Followers  |  145 Following  |  25 Posts  |  Joined: 12.11.2024  |  2.0778

Latest posts by ben-hayes.bsky.social on Bluesky

Post image

πŸ”Š Follow the links above for audio examples, full training code, and the arXiv pre-print.

10.06.2025 10:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸ† We then apply this method to a dataset of sounds sampled from Surge XT β€” a feature rich software synthesizer β€” and find that it dramatically outperforms state-of-the-art baselines on audio reconstruction.

10.06.2025 10:12 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ€” However, in the case of real synthesizers, we may not know the appropriate symmetries a priori. To allow them to be discovered adaptively, we introduce a technique called Param2Tok, which learns a mapping from synthesizer parameters to model tokens.

10.06.2025 10:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ—ΊοΈ We can further improve performance by designing a model with equivariance to the appropriate symmetry.

10.06.2025 10:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ“ˆ We design a toy task that isolates this phenomenon and find that the presence of permutation symmetry degrades the performance of conventional methods. We then show that a generative approach, which can assign predictive weight to multiple possible solutions, performs considerably better.

10.06.2025 10:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

‼️ In this work, we argue that the problem is ill-posed: there are multiple sets of parameters that produce any given sound. Further, we show that many of these equivalent solutions are due to intrinsic symmetries of the synthesizer!

10.06.2025 10:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ§‘β€πŸ”¬ Previous approaches have struggled to scale to the full complexity of synthesizers used in modern audio production. Why?

10.06.2025 10:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸŽ›οΈ Programming synthesizers is a fiddly business, and so a line of work known as "sound matching" has, over the last few decades, sought to answer the question: given an audio signal and a synthesizer, which configuration of parameters best approximates the signal?

10.06.2025 10:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

🎹 Audio synthesizers are diverse and complex beasts, combining a variety of techniques to produce sounds ranging from familiar to entirely alien.

10.06.2025 10:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

TL;DR: Predicting synthesizer parameters from audio is hard because multiple parameter configurations can produce the same sound. We design a model that accounts for this and find that it dramatically outperforms previous approaches, and works on production grade, feature rich VST synthesizers.

10.06.2025 10:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Very excited to share that our latest work, "Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching", has been accepted to ISMIR 2025 in Daejon, Korea!

Paper: arxiv.org/abs/2506.07199
Audio: benhayes.net/synth-perm/
Code: github.com/ben-hayes/sy...

🧡

10.06.2025 10:12 β€” πŸ‘ 12    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

going to Korea, baby! πŸ‡°πŸ‡· #ISMIR2025

07.06.2025 08:53 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1
DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions Chin-Yun Yu, Marco A. MartΓ­nez-RamΓ­rez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, GyΓΆrgy Fazekas, Yuki Mitsufuji

DiffVox integrates differentiable vocal effects; analysis reveals parameter correlations and connections to McAdams' timbre dimensions; parameter distributions non-Gaussian; code and datasets available.

22.04.2025 08:48 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Generative modelling in latent space Latent representations for generative models.

wake up, babe. new @sedielem.bsky.social just dropped

sander.ai/2025/04/15/l...

15.04.2025 11:23 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

amazing how the soothing beep of stolen Lime bikes has so naturally woven itself into the London soundscape

12.04.2025 18:14 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
hard drive clear out 2016-2020, by Ben Hayes 21 track album

turned on an old computer and found some old unfinished music gathering dust. uploading it so it at least lives somewhere.

06.04.2025 16:11 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

the best ones combine two or more

29.03.2025 00:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

realised tonight there are only 3 red hot chili peppers songs:

1. california
2. zoop di blamp
3. heroin, but it's a woman

29.03.2025 00:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Designing Neural Synthesizers for Low Latency Interaction Franco Caspe, Jordie Shier, Mark Sandler, Charalampos Saitis, Andrew McPherson

A low-latency neural audio synthesizer (BRAVE) was designed by analyzing latency sources in existing models (RAVE); BRAVE improved pitch and loudness replication while maintaining timbre modification capabilities, implemented in a specialized inference framework.

17.03.2025 11:08 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

negative \vspace season approaches 😈

05.03.2025 15:45 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
NablAFx: A Framework for Differentiable Black-box and Gray-box Modeling of Audio Effects Marco ComunitΓ , Christian J. Steinmetz, Joshua D. Reiss

NablAFx, an open-source PyTorch framework, supports differentiable black-box and gray-box modeling of audio effects; it includes model architectures, datasets, training features, and plotting functions.

18.02.2025 10:48 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Deep Learning 101 for Audio-based MIR β€” Deep Learning 101 for Audio-based MIR

Two excellent recent resources:

1. (not strictly a paper) This tutorial from the last ISMIR, courtesy of: geoffroypeeters.github.io/deeplearning...
2. This overview of model-based deep learning for MIR: arxiv.org/abs/2406.11540

13.02.2025 10:15 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Equivariant flow matching Normalizing flows are a class of deep generative models that are especially interesting for modeling probability distributions in physics, where the exact likelihood of flows allows reweighting to kno...

I look at it as squeezing a *slightly* better coupling out of the batch.

they do something related here (arxiv.org/abs/2306.15030) with the Kabsch algorithm, but they transform the target samples as they're specifically trying to learn a rotation invariant distribution with an equivariant flow.

29.01.2025 11:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

haven't crunched through it on paper but my hunch is this works because of the spherical symmetry of the Gaussian dist, so any orthogonal transformation of the batch is exactly as probable (should work for any O(d) invariant distribution if true)

29.01.2025 11:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

very anecdotally, I've found that when using a normal source distribution, performing orthogonal Procrustes on the source samples (to match the target samples) after minibatch coupling by exact linear assignment (Hungarian algo), seems to speed up convergence by a noticeable amount.

29.01.2025 11:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

amazing, @drscotthawley.bsky.social ! I've been recommending this post to everyone recently.

24.01.2025 11:11 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

🎢✨ New Paper Announcement! ✨🎢
We present "Improving Musical Accompaniment Co-creation via Diffusion Transformers" πŸŽΉπŸŽΈβ€”a study advancing our Diff-A-Riff stem generator through improved quality, efficiency, and control.

πŸ“œRead the full paper here: arxiv.org/pdf/2410.23005 πŸ§΅πŸ‘‡

20.01.2025 13:42 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 3    πŸ“Œ 0

This seems to be where ML-facing config libraries (hydra, gin, jsonargparse, etc) converge, and is what I grudgingly end up doing. It makes me wince, though, because it seems to lead invariably to non-trivial and untested instantiation logic being encoded in the relationships between config files.

16.12.2024 13:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

1. this is excellent work
2. your vocal imitations are everything ❀️

12.12.2024 18:20 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

speaking at Akademie der Bildenden KΓΌnste in Munich on Dec 16th

"Phantasmagoria: Sound Synthesis after the Turing Test"

about the methodological, ethical, and environmental implications of Generative AI for audio

by invitation from Florian Hecker

hal.science/hal-04650754

06.12.2024 11:30 β€” πŸ‘ 13    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

@ben-hayes is following 20 prominent accounts