Gallil Maimon's Avatar

Gallil Maimon

@gallilmaimon.bsky.social

PhD student @CseHuji; Audio Processing, Speech Language Modelling

55 Followers  |  44 Following  |  13 Posts  |  Joined: 17.11.2024  |  1.7159

Latest posts by gallilmaimon.bsky.social on Bluesky

Preview
Slamming: Training a Speech Language Model on One GPU in a Day Slam is a training recipe for training high-quality SLMs on 1 gpu in 24 hours.

Hey,

I added some longer generation examples, by enforcing `min_new_tokens`. Definitely can lose itself a bit more but still pretty decent I think :)

Check it out:
pages.cs.huji.ac.il/adiyoss-lab/...

And feel free to generate anything with a single line of code:
github.com/slp-rl/slamkit

04.03.2025 15:46 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

@gallilmaimon.bsky.social and his team trained a Speech Language Models on 1xA5000 GPU in 24 hours

26.02.2025 01:13 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Slamming: Training a Speech Language Model on One GPU in a Day We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, ...

I love papers that make ML training accessible with consumer GPUs. Great example: "Slamming: Training a Speech Language Model on One GPU in a Day" released 3 days ago. The full code and training data are available and reproducible using a 24GB RTX 3090.

- arxiv.org/abs/2502.15814

28.02.2025 16:19 β€” πŸ‘ 12    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

We generated samples with a max length, but the model can predict an "end" token before. One could play with sampling params to make the model keep talking:)

I will try get time to generate longer samples, but also encourage everyone to play around themselves. We tried to make it relatively easyπŸ™

28.02.2025 17:35 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

And about this - yes!
We are accepting PRs to add more tokenisers, better optimisers, efficient attention implementations and anything that seems relevant :)

Feel free to reach out πŸ’ͺ

28.02.2025 17:29 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Hey!
Really pleased you liked our work:) I think with the help of the open source community we can push results even further.

About generation length - the model context is 1024~=40 seconds of audio, but we used a setup like TWIST for evaluation. Definitely worth testing longer generations!

28.02.2025 17:25 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ”œπŸ—£οΈIt was shown to be really useful for training SpeechLMs. We are working on some stuff now to hopefully make it even easier. More to come soon!πŸ’ͺ

11.01.2025 20:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
slprl/mhubert-base-25hz Β· Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

🚨Attention #speech @hf.co peopleπŸ€—πŸ’¬
We added official support for mhubert-25hz from TWIST in transformers. We also converted it from fairseq to HF!

Check it out✌️
huggingface.co/slprl/mhuber...

11.01.2025 20:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I am thrilled to share that SALMon🍣 got accepted to #ICASSP25

For code, data, preprint and live leaderboard checkout - pages.cs.huji.ac.il/adiyoss-lab/...

w/ Amit Roth and Yossi Adi

21.12.2024 06:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

For instance, in my opinion, in this example it feels unlikely that people would use stress to convey these meanings. Happy for all and any suggestions and insights :)

16.12.2024 10:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

#Speech people: I am looking for examples (or resources) where stress or emphasis on a phrase changes the meaning of a sentence. This part of a study on intonation in SpeechLMs.

I gave a decent ChatGPT answer below, but many weren't great...

16.12.2024 10:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
SALMon: Suite for Acoustic Language Model evaluation SALMon is a suite of benchmarks for evaluating Speech Language Models' ability to model acoustics.

πŸ₯‡Project page (+leaderboard) - pages.cs.huji.ac.il/adiyoss-lab/...
πŸ“œPaper - arxiv.org/abs/2409.07437
πŸ’»Code - github.com/slp-rl/salmon
πŸ€— Data - huggingface.co/datasets/slp...

28.11.2024 08:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸͺ™ I assume sentiment improved because of style tokens (also shown in STSP metric from SpiritLM). I wonder what is limiting performance - data? modelling? tokens? We welcome suggestions and new SLMs!

28.11.2024 08:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We added SpiritLM to the SALMon🍣 leaderboard! Nice jump in emotion consistency, but still no improvement in jointly modelling text content and acousticsπŸ₯²
Think your SLM can do better?πŸ’ͺ
linksπŸ‘‡

28.11.2024 08:39 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

I've started putting together a starter pack with people working on Speech Technology and Speech Science: go.bsky.app/BQ7mbkA

(Self-)nominations welcome!

19.11.2024 11:13 β€” πŸ‘ 82    πŸ” 34    πŸ’¬ 44    πŸ“Œ 3

Great list! I’d be happy to join as well :)

27.11.2024 10:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@gallilmaimon is following 20 prominent accounts