Beomseok Lee's Avatar

Beomseok Lee

@beomseok-lee.bsky.social

PhD student @uniTrento. Affiliated in @naverlabseurope and @fbk_mt. Ex research engineer @samsungresearch

31 Followers  |  10 Following  |  4 Posts  |  Joined: 10.12.2024  |  1.2989

Latest posts by beomseok-lee.bsky.social on Bluesky

Preview
Can Speech LLMs Think while Listening? Recent advances in speech large language models (speech LLMs) have enabled seamless spoken interactions, but these systems still struggle with complex reasoning tasks. Previously, chain-of-thought (Co...

Can we make Speech LLMs actually think as they listen? πŸ‘‚πŸ’­
This fascinating work applies CoT inspired by human β€œthinking while listening”, training models to find the inflection point when reasoning starts.
πŸ“„ arxiv.org/abs/2510.07497

29.10.2025 12:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1
Preview
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs With the rise of Speech Large Language Models (SpeechLLMs), two dominant approaches have emerged for speech processing: discrete tokens and continuous features. Each approach has demonstrated strong c...

πŸ€” Ever wondered how discrete tokens vs. continuous features behave in SpeechLLMs?
This new work dives into 6 SLU tasks and reveals some interesting takeaways!
arxiv.org/abs/2508.17863

28.08.2025 09:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1
Preview
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs Large Language Models (LLMs) are widely used in Spoken Language Understanding (SLU). Recent SLU models process audio directly by adapting speech input into LLMs for better multimodal learning. A key c...

Speech-language models show promise in multimodal tasksβ€”but how well are speech & text actually aligned? πŸ€”

This paper arxiv.org/abs/2505.19937 proposes a new metric to measure layer-wise correlation between the two, with a focus on SLU tasks. πŸ”πŸ—£οΈπŸ“„

11.06.2025 12:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1
Preview
AlignFormer: Modality Matching Can Achieve Better Zero-shot Instruction-Following Speech-LLM Integrating speech into LLM (speech-LLM) has gaining increased attention recently. The mainstream solution is to connect a well-trained speech encoder and LLM with a neural adapter. However, the lengt...

Should speech come before the instruction text, or should the instruction text come first in a speech-language model?
Find out the best positioning for speech and textβ€”and the novel adapter that aligns speech and text modalities!
arxiv.org/abs/2412.01145

03.04.2025 10:42 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

@beomseok-lee is following 10 prominent accounts