π Follow the links above for audio examples, full training code, and the arXiv pre-print.
10.06.2025 10:12 β π 0 π 0 π¬ 0 π 0@ben-hayes.bsky.social
Machine learning for audio synthesis @ Sony CSL Paris PhD @ C4DM, QMUL. Former intern at Spotify, Sony CSL, Bytedance
π Follow the links above for audio examples, full training code, and the arXiv pre-print.
10.06.2025 10:12 β π 0 π 0 π¬ 0 π 0π We then apply this method to a dataset of sounds sampled from Surge XT β a feature rich software synthesizer β and find that it dramatically outperforms state-of-the-art baselines on audio reconstruction.
10.06.2025 10:12 β π 1 π 0 π¬ 1 π 0π€ However, in the case of real synthesizers, we may not know the appropriate symmetries a priori. To allow them to be discovered adaptively, we introduce a technique called Param2Tok, which learns a mapping from synthesizer parameters to model tokens.
10.06.2025 10:12 β π 0 π 0 π¬ 1 π 0πΊοΈ We can further improve performance by designing a model with equivariance to the appropriate symmetry.
10.06.2025 10:12 β π 0 π 0 π¬ 1 π 0π We design a toy task that isolates this phenomenon and find that the presence of permutation symmetry degrades the performance of conventional methods. We then show that a generative approach, which can assign predictive weight to multiple possible solutions, performs considerably better.
10.06.2025 10:12 β π 0 π 0 π¬ 1 π 0βΌοΈ In this work, we argue that the problem is ill-posed: there are multiple sets of parameters that produce any given sound. Further, we show that many of these equivalent solutions are due to intrinsic symmetries of the synthesizer!
10.06.2025 10:12 β π 0 π 0 π¬ 1 π 0π§βπ¬ Previous approaches have struggled to scale to the full complexity of synthesizers used in modern audio production. Why?
10.06.2025 10:12 β π 0 π 0 π¬ 1 π 0ποΈ Programming synthesizers is a fiddly business, and so a line of work known as "sound matching" has, over the last few decades, sought to answer the question: given an audio signal and a synthesizer, which configuration of parameters best approximates the signal?
10.06.2025 10:12 β π 0 π 0 π¬ 1 π 0πΉ Audio synthesizers are diverse and complex beasts, combining a variety of techniques to produce sounds ranging from familiar to entirely alien.
10.06.2025 10:12 β π 0 π 0 π¬ 1 π 0TL;DR: Predicting synthesizer parameters from audio is hard because multiple parameter configurations can produce the same sound. We design a model that accounts for this and find that it dramatically outperforms previous approaches, and works on production grade, feature rich VST synthesizers.
10.06.2025 10:12 β π 0 π 0 π¬ 1 π 0Very excited to share that our latest work, "Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching", has been accepted to ISMIR 2025 in Daejon, Korea!
Paper: arxiv.org/abs/2506.07199
Audio: benhayes.net/synth-perm/
Code: github.com/ben-hayes/sy...
π§΅
going to Korea, baby! π°π· #ISMIR2025
07.06.2025 08:53 β π 4 π 0 π¬ 0 π 1DiffVox integrates differentiable vocal effects; analysis reveals parameter correlations and connections to McAdams' timbre dimensions; parameter distributions non-Gaussian; code and datasets available.
22.04.2025 08:48 β π 3 π 1 π¬ 0 π 0wake up, babe. new @sedielem.bsky.social just dropped
sander.ai/2025/04/15/l...
amazing how the soothing beep of stolen Lime bikes has so naturally woven itself into the London soundscape
12.04.2025 18:14 β π 2 π 0 π¬ 0 π 0turned on an old computer and found some old unfinished music gathering dust. uploading it so it at least lives somewhere.
06.04.2025 16:11 β π 7 π 1 π¬ 0 π 0the best ones combine two or more
29.03.2025 00:23 β π 0 π 0 π¬ 0 π 0realised tonight there are only 3 red hot chili peppers songs:
1. california
2. zoop di blamp
3. heroin, but it's a woman
A low-latency neural audio synthesizer (BRAVE) was designed by analyzing latency sources in existing models (RAVE); BRAVE improved pitch and loudness replication while maintaining timbre modification capabilities, implemented in a specialized inference framework.
17.03.2025 11:08 β π 8 π 1 π¬ 0 π 0negative \vspace season approaches π
05.03.2025 15:45 β π 6 π 0 π¬ 0 π 0NablAFx, an open-source PyTorch framework, supports differentiable black-box and gray-box modeling of audio effects; it includes model architectures, datasets, training features, and plotting functions.
18.02.2025 10:48 β π 1 π 1 π¬ 0 π 0Two excellent recent resources:
1. (not strictly a paper) This tutorial from the last ISMIR, courtesy of: geoffroypeeters.github.io/deeplearning...
2. This overview of model-based deep learning for MIR: arxiv.org/abs/2406.11540
I look at it as squeezing a *slightly* better coupling out of the batch.
they do something related here (arxiv.org/abs/2306.15030) with the Kabsch algorithm, but they transform the target samples as they're specifically trying to learn a rotation invariant distribution with an equivariant flow.
haven't crunched through it on paper but my hunch is this works because of the spherical symmetry of the Gaussian dist, so any orthogonal transformation of the batch is exactly as probable (should work for any O(d) invariant distribution if true)
29.01.2025 11:02 β π 1 π 0 π¬ 1 π 0very anecdotally, I've found that when using a normal source distribution, performing orthogonal Procrustes on the source samples (to match the target samples) after minibatch coupling by exact linear assignment (Hungarian algo), seems to speed up convergence by a noticeable amount.
29.01.2025 11:02 β π 1 π 0 π¬ 1 π 0amazing, @drscotthawley.bsky.social ! I've been recommending this post to everyone recently.
24.01.2025 11:11 β π 2 π 0 π¬ 1 π 0πΆβ¨ New Paper Announcement! β¨πΆ
We present "Improving Musical Accompaniment Co-creation via Diffusion Transformers" πΉπΈβa study advancing our Diff-A-Riff stem generator through improved quality, efficiency, and control.
πRead the full paper here: arxiv.org/pdf/2410.23005 π§΅π
This seems to be where ML-facing config libraries (hydra, gin, jsonargparse, etc) converge, and is what I grudgingly end up doing. It makes me wince, though, because it seems to lead invariably to non-trivial and untested instantiation logic being encoded in the relationships between config files.
16.12.2024 13:03 β π 1 π 0 π¬ 1 π 01. this is excellent work
2. your vocal imitations are everything β€οΈ
speaking at Akademie der Bildenden KΓΌnste in Munich on Dec 16th
"Phantasmagoria: Sound Synthesis after the Turing Test"
about the methodological, ethical, and environmental implications of Generative AI for audio
by invitation from Florian Hecker
hal.science/hal-04650754