11/11 : Some future work areas include scalability to more variant concepts, and moving from obvious concepts like key and tempo to more high-level notions which could be interesting for retrieval. More to come!
30.12.2024 17:30 β π 0 π 0 π¬ 0 π 0
10/11 : Not only that, LOEV++ allows users to control what attribute is more important for retrieval! By searching in the time-variant space, we get better retrieval results for tempo retrieval. Users can search for similar songs by specifying what kind of similarity they want.
30.12.2024 17:29 β π 0 π 0 π¬ 1 π 0
9/11 : To take it one step further, we propose LOEV++. Instead of splitting the network at the projection heads, we split it before, creating individual latent spaces containing the augmentation information in a disentangled way. We show that this further improves performance.
30.12.2024 17:28 β π 0 π 0 π¬ 1 π 0
8/11 : We do this by simply tracking which augmentations are applied and modifying the targets accordingly. We show through downstream probing that this forces the encoder to *not* discard key and tempo information, while keeping potent representations for general tasks.
30.12.2024 17:28 β π 0 π 0 π¬ 1 π 0
7/11 Our approach is simple: We keep an all-invariant projection head, but build two more projection heads, Pitch-variant and Time-variant. Each head has its own contrastive objective: In the pitch-variant head, views that have been augmented with pitch shifting are treated as *negatives*.
30.12.2024 17:27 β π 0 π 0 π¬ 1 π 0
6/11 So, there is a tradeoff - coming from the applied augmentations - between general and task-specific performance. LOEV, aims to fix this. We focus on two augmentations, Time Stretching (TS) and Pitch Shifting (PS), which are explicitly related to the musical notions of Tempo and Key.
30.12.2024 17:26 β π 0 π 0 π¬ 1 π 0
In music, this can be catastrophic. Take the example of a song in the key of A Major. Apply a pitch shifting augmentation to it, and you end up with two different keys! A contrastive model will still map them in the same latent spot. This can cause the key space to completely collapse.
30.12.2024 17:25 β π 0 π 0 π¬ 1 π 0
4/11: It has been shown that stronger augmentations lead to generally better performance on downstream tasks. But what happens when a downstream task needs representations to be variant to a certain transformation?
30.12.2024 17:23 β π 0 π 0 π¬ 1 π 0
3/11 In doing so, contrastive models effectively learn invariances. By learning to map augmented data points to the same spot in the latent space. They learn to be *invariant* to the augmentations.
30.12.2024 17:23 β π 0 π 0 π¬ 1 π 0
2/11 : Unimodal contrastive learning uses augmentations to produce different views of samples. The model then learns to push views from the same sample together in the latent space, and repel views from different samples. This allows models to internalize semantic information without supervision.
30.12.2024 17:22 β π 0 π 0 π¬ 1 π 0
1/11 : In this work, we propose a simple way to mitigate the loss of information due to learned invariances in contrastive learning for music. This information loss can be catastrophic for downstream tasks and LOEV is a very cheap lunch to fix that!
30.12.2024 17:22 β π 0 π 0 π¬ 1 π 0
Happy to announce that our new work "Leave One EquiVariant: Alleviating invariance-related information loss in contrastive music representations" (arxiv.org/pdf/2412.18955) was accepted for #ICASSP2025! Very excited to go to India in April.
π§΅ below:
30.12.2024 17:21 β π 2 π 0 π¬ 1 π 0
Automated posting of sound-related articles uploaded to arxiv.org (eess.AS + cs.SD)
Source: https://github.com/dsuedholt/bsky-paperbot-sound/
Inspired by @paperposterbot.bsky.social and https://twitter.com/ArxivSound
The Centre for Digital Music at Queen Mary University of London is a world-leading, multidisciplinary research group in the field of music & audio Technology.
1st-year CS PhD student at UCSD
I work on music and ML.
havenpersona.github.io
Audio Research Engineer @ Output
We're the Music Understanding Synthesis and AI Creativity Group at UCSD! (PIs: Julian McAuley, Taylor Berg-Kirkpatrick, Shlomo Dubnov)
https://ucsd-musaic.github.io
Efficient+Controllable Audio Generation @ UCSD | Interning Stability AI, Adobe | Teaching drums @ POW Percussion
Assistant Professor at University of Michigan | PhD from UC San Diego | Human-Centered Generative AI for Content Creation
Audio and AI researcher. Faculty in Siebel School at UIUC and Visiting Academic at Amazon Lab126. A working dad. Some obsolete hobbies: music, photography, drawing, and writing. Still active interests: cooking.
π https://minjekim.com
research on llm + music (https://seungheondoh.github.io/).
PhD Candidate @ Music and Audio Computing Lab, KAIST. Previously an intern @Adobe, @BytedanceTalk, @Naver, @Chartmetric.
AI researcher in music, audio, LLMs.
Audio ML Research @ Auto-Tune π€π΅
Bay Area SSBM & RL gamer
Love to talk Cognitive Science, Linguistics, Bio-inspired Learning, Topological Signal Processing & TDA
Researcher at Adobe Research. Machine learning on audio. Screamer. Oaklander born in Barcelona. Titan. He/they π
www.urinieto.com
Now: Audio & Multimodal ML PhD in the Music and Audio Research Lab @ NYU
Prev: Data Developer at Sonos and Northwestern, Research Intern at Adobe + Bosch Research
assistant professor at Warsaw University of Technology || music/audio AI researcher || experienced touring musician π₯ || prev @apple
Machine learning for audio synthesis @ Sony CSL Paris
PhD @ C4DM, QMUL.
Former intern at Spotify, Sony CSL, Bytedance
Asst Prof, Dartmouth CS. Human-AI Systems. Prev: PhD @ MIT, Allen Institute for AI, Netflix Research, Berklee College of Music.
https://nsingh1.host.dartmouth.edu
PhD in AI & Audio/Music @QMUL with @SpotifyResearch
Deep Generative Models / Disentangled Representation Learning / Unsupervised Learning / Controllable Audio Generation
Intern @StabilityAI / Prev. Intern @SonyAI @AIST_JP
https://yjlolo.github.io
AI & Music Data Scientist at @Music.AI | prev. @c4dm
Music machine learning, MIR, ML, DSP