Beomseok Lee @beomseok-lee - Bluesky Profile

Can Speech LLMs Think while Listening? Recent advances in speech large language models (speech LLMs) have enabled seamless spoken interactions, but these systems still struggle with complex reasoning tasks. Previously, chain-of-thought (Co...

Can we make Speech LLMs actually think as they listen? 👂💭
This fascinating work applies CoT inspired by human “thinking while listening”, training models to find the inflection point when reasoning starts.
📄 arxiv.org/abs/2510.07497

29.10.2025 12:48 — 👍 0 🔁 0 💬 0 📌 1

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs With the rise of Speech Large Language Models (SpeechLLMs), two dominant approaches have emerged for speech processing: discrete tokens and continuous features. Each approach has demonstrated strong c...

🤔 Ever wondered how discrete tokens vs. continuous features behave in SpeechLLMs?
This new work dives into 6 SLU tasks and reveals some interesting takeaways!
arxiv.org/abs/2508.17863

28.08.2025 09:02 — 👍 1 🔁 0 💬 0 📌 1

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs Large Language Models (LLMs) are widely used in Spoken Language Understanding (SLU). Recent SLU models process audio directly by adapting speech input into LLMs for better multimodal learning. A key c...

Speech-language models show promise in multimodal tasks—but how well are speech & text actually aligned? 🤔

This paper arxiv.org/abs/2505.19937 proposes a new metric to measure layer-wise correlation between the two, with a focus on SLU tasks. 🔍🗣️📄

11.06.2025 12:53 — 👍 0 🔁 0 💬 0 📌 1

AlignFormer: Modality Matching Can Achieve Better Zero-shot Instruction-Following Speech-LLM Integrating speech into LLM (speech-LLM) has gaining increased attention recently. The mainstream solution is to connect a well-trained speech encoder and LLM with a neural adapter. However, the lengt...

Should speech come before the instruction text, or should the instruction text come first in a speech-language model?
Find out the best positioning for speech and text—and the novel adapter that aligns speech and text modalities!
arxiv.org/abs/2412.01145

03.04.2025 10:42 — 👍 2 🔁 0 💬 0 📌 1

Beomseok Lee

Latest posts by beomseok-lee.bsky.social on Bluesky

@beomseok-lee is following 10 prominent accounts