Stable-V2A: Synchronized Sound Effects Synthesis
Stable-V2A is a two-stage model for synthesizing synchronized sound effects with support for temporal and semantic controls.
๐ Excited to Share Our Latest Work! ๐ฅ๐ถ
Here we present Stable-V2A: Synthesis of Synchronized Sound Effects with Temporal and Semantic Controls
arxiv: arxiv.org/abs/2412.15023
Video presentation and results: ispamm.github.io/Stable-V2A
20.12.2024 18:18 โ ๐ 5 ๐ 2 ๐ฌ 1 ๐ 2
I initiated a starter pack for Audio ML. Let me know if you'd like to be added/removed.
go.bsky.app/LGmct4z
18.11.2024 04:46 โ ๐ 67 ๐ 22 ๐ฌ 46 ๐ 1
Hy, I would like to be added! Thanks! ๐๐๐
20.12.2024 11:38 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
Assistant Professor at STMLab - UniKore.
Speech, Audio, and Natural Language Processing.
ai phd student (+ went through music school as a percussionist) | (he/him)
Assistant Professor @Sapienza, Rome.
Generative AI, Multimodal Learning, Generative Semantic Communication
Sound effects, audio & video | PhD at @ISPAMM, @Sapienza | Former @C4DM, @QMUL
The Centre for Digital Music at Queen Mary University of London is a world-leading, multidisciplinary research group in the field of music & audio Technology.
Speech and audio research scientist @MERL. saneworkshop.org co-founder. IguanaTex developer.
๐ jonathanleroux.org
๐ github.com/Jonathan-LeRoux/
๐ scholar.google.com/citations?user=aUpxty8AAAAJ&hl=en
Professor/Admin @ Ohio State. All opinions expressed on this channel are my personal opinions and do not represent that of my employer.
Full professor of inclusive speech communication at TU Delft, The Netherlands. Former president of the International Speech Communication Association (ISCA). General Chair of @interspeech.bsky.social Rotterdam, 2025. Mother of 3๐
Principal Research Scientist at IBM Research AI in New York. Speech, Formal/Natural Language Processing. Currently LLM post-training, structured SDG and RL. Opinions my own and non stationary.
ramon.astudillo.com
AI scientist & consultant :: prev Amazon Alexa, Toshiba, Cam Uni :: voice & language tech :: powered by coffee :: photographer :: Cambridge UK
https://www.catherinebreslin.co.uk
Studying language in biological brains and artificial ones at the Kempner Institute at Harvard University.
www.tuckute.com
Guitarist, Researcher Google DeepMind. Opinions are my own.
Researcher in computer audition, machine learning, and HCI. Sr. Research Scientist, @AdobeResearch. Previously @DescriptApp, @Northwestern.
https://pseeth.github.io/
AI for Music โข Research Scientist @ Suno
I created pyannote open source toolkit.
Co-founder and CSO at pyannoteAI
Scientist at CNRS.
https://audio.ls2n.fr
A science game to test your musical memory: https://tunetwins.app
Canadian in NYC (she/her) teaching music and data analysis at Brooklyn College and the Graduate Center, CUNY. Co-Editor-in-Chief of Journal of New Music Research.
Once was speech technologist - Water of Leith, Edinburgh - Born 320.23 ppm