Eugene Kharitonov's Avatar

Eugene Kharitonov

@n0mad-0.bsky.social

Technical Staff at @kyutai.org. Previously Google Deep mind, Meta AI Research. CS PhD.

10 Followers  |  29 Following  |  1 Posts  |  Joined: 14.03.2025  |  1.4449

Latest posts by n0mad-0.bsky.social on Bluesky

Post image

Our latest open-source speech-to-text model just claimed 1st place among streaming models and 5th place overall on the OpenASR leaderboard ๐Ÿฅ‡๐ŸŽ™๏ธ
While all other models need the whole audio, ours delivers top-tier accuracy on streaming content.
Open, fast, and ready for production!

27.06.2025 10:31 โ€” ๐Ÿ‘ 4    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

Have you enjoyed talking to ๐ŸŸขMoshi and dreamt of making your own speech to speech chat experience๐Ÿง‘โ€๐Ÿ”ฌ๐Ÿค–? It's now possible with the moshi-finetune codebase! Plug your own dataset and change the voice/tone/personality of Moshi ๐Ÿ’š๐Ÿ”Œ๐Ÿ’ฟ. An example after finetuning w/ only 20 hours of the DailyTalk dataset. ๐Ÿงต

01.04.2025 15:47 โ€” ๐Ÿ‘ 6    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2

Just back from holidays, so a bit late, to announce MoshiVis, extending Moshi's multimodal capabilities to take in images ๐Ÿ“ท.
Only 200M weights were added to plug a ViT through cross attention with gating ๐Ÿ–ผ๏ธ๐Ÿ”€๐ŸŽค
Training relies on a mix of text only and text+audio synthetic data (~20k hours) ๐Ÿ’ฝ

31.03.2025 10:06 โ€” ๐Ÿ‘ 3    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Hello ๐ŸŒŽ!

15.03.2025 07:13 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@n0mad-0 is following 20 prominent accounts