Can we make Speech LLMs actually think as they listen? ππ
This fascinating work applies CoT inspired by human βthinking while listeningβ, training models to find the inflection point when reasoning starts.
π arxiv.org/abs/2510.07497
@beomseok-lee.bsky.social
PhD student @uniTrento. Affiliated in @naverlabseurope and @fbk_mt. Ex research engineer @samsungresearch
Can we make Speech LLMs actually think as they listen? ππ
This fascinating work applies CoT inspired by human βthinking while listeningβ, training models to find the inflection point when reasoning starts.
π arxiv.org/abs/2510.07497
π€ Ever wondered how discrete tokens vs. continuous features behave in SpeechLLMs?
This new work dives into 6 SLU tasks and reveals some interesting takeaways!
arxiv.org/abs/2508.17863
Speech-language models show promise in multimodal tasksβbut how well are speech & text actually aligned? π€
This paper arxiv.org/abs/2505.19937 proposes a new metric to measure layer-wise correlation between the two, with a focus on SLU tasks. ππ£οΈπ
Should speech come before the instruction text, or should the instruction text come first in a speech-language model?
Find out the best positioning for speech and textβand the novel adapter that aligns speech and text modalities!
arxiv.org/abs/2412.01145