Maria Teleki @mariateleki - Bluesky Profile

📄 buff.ly/S0DSZzt
⚽️ Xiangjue Dong (1st author), Cong Wang, Millenium Bismay, and James Caverlee
#NLP #NLPResearch #LLMs #GenAI #AI

11.11.2025 16:38 — 👍 1 🔁 0 💬 0 📌 0

🌟 To me, this work is super exciting because we take a totally different perspective: we show that ⬆️ diverse perspectives, ⬆️ system performance, so ⬆️ $$$ for a company! With this work, we argue that <<< 🚨 diverse perspectives are absolutely necessary >>> from an economic standpoint.

11.11.2025 16:38 — 👍 0 🔁 0 💬 1 📌 0

You always hear about the "bias-accuracy tradeoff," meaning that ⬇️ model bias, ⬇️ system performance, so ⬇️ $$$ for a company. So much of the conversation around bias and diversity has focused on how to incentivize companies to debias their models (e.g., through new legislation).

11.11.2025 16:38 — 👍 0 🔁 0 💬 1 📌 0

In our new work, 🎶 CHOIR: Collaborative Harmonization fOr Inference Robustness, we show that different LLM personas often get different benchmark questions right! CHOIR leverages this diversity to boost performance across benchmarks. 📊

11.11.2025 16:38 — 👍 0 🔁 0 💬 1 📌 0

LLMs can write — but can they tell stories?

Our survey shows they struggle with:
⚠️ Long-term coherence
⚠️ Controllability

📚 Paper: mariateleki.github.io/pdf/A_Survey...

#StoryGeneration #GenerativeAI #NLP

08.11.2025 19:15 — 👍 1 🔁 0 💬 0 📌 0

Choosing an ASR system isn’t one-size-fits-all — it depends on the disfluencies in your domain.

📄 www.isca-archive.org/interspeech_...

03.11.2025 18:02 — 👍 0 🔁 0 💬 0 📌 0

Disfluencies aren’t just noise — they’re part of how we speak.

In our #INTERSPEECH2024 paper, we looked at how Google ASR vs WhisperX handle messy, real-world podcasts (82k+ episodes!):

🎙️ WhisperX → better with “uh/um”
📝 Google ASR → better with self-corrections

03.11.2025 18:02 — 👍 0 🔁 0 💬 1 📌 0

Highlight of my PhD → mentoring students. It's literally just the most fun to brainstorm with them each week and watch them learn and grow 🌱

#AcademicMentoring #PhDLife

01.11.2025 18:15 — 👍 1 🔁 0 💬 0 📌 0

We can’t fix what we don’t measure.

That’s why I build evaluation frameworks for speech & conversational AI — so we can stress-test systems against real-world variability.

#AIResearch #Evaluation #SpeechProcessing

25.10.2025 18:15 — 👍 0 🔁 0 💬 0 📌 0

I’m working on methods & evaluation frameworks for conversational AI that are:
✅ Robust to disfluencies
✅ Reliable in noisy, real-world conditions
✅ Generalizable across contexts

If conversational AI is going to truly work for everyone, it must be built for human speech as it is.

20.10.2025 18:29 — 👍 0 🔁 0 💬 0 📌 0

Why “speech-first” AI?
Because speech ≠ text.

🎙️ People pause, restart, self-correct
🌎 Background noise & accents vary
💬 Context shifts across domains

20.10.2025 18:29 — 👍 0 🔁 0 💬 1 📌 0

What happens when you say:
“I want a horror -- comedy -- movie”? 🎥

That slip-of-the-tongue can confuse recommender systems.
Our INTERSPEECH 2025 paper shows some LLMs handle it better than others.

📄 mariateleki.github.io/pdf/HorrorCo...

#INTERSPEECH2025 #ConversationalAI #RecSys

18.10.2025 18:15 — 👍 0 🔁 0 💬 0 📌 0

These insights still apply to anyone working on conversational AI, spoken summarization, or voice-driven interfaces today.

📄 Read more: www.isca-archive.org/interspeech_...

#SpeechProcessing #ConversationalAI #VoiceAI #Disfluency #SpokenLanguage
#INTERSPEECH

15.10.2025 17:05 — 👍 0 🔁 0 💬 0 📌 0

We compared two systems on 82,000+ podcast episodes. We found:
👉 WhisperX better captures interjections like “uh” and “um”
👉 Google ASR better captures edited nodes (e.g., “let’s go to Target--Walmart”)

🌟 The type of disfluency matters when choosing an ASR system

15.10.2025 17:05 — 👍 0 🔁 0 💬 1 📌 0

Last year at INTERSPEECH 2024, we explored a question that remains relevant: how do ASR systems handle disfluencies in real-world speech?

15.10.2025 17:05 — 👍 0 🔁 0 💬 1 📌 0

Stories shape how we think and connect. 📖
But can AI tell a good story?

Our Survey on LLMs for Story Generation (EMNLP Findings 2025) explores:
✨ Coherence
🎛️ Controllability
🎨 Creativity
⚖️ Authenticity

📄 mariateleki.github.io/pdf/A_Survey...

#StoryGeneration #GenerativeAI

11.10.2025 18:15 — 👍 0 🔁 0 💬 0 📌 0

Speech isn’t perfect.
We restart, repeat, and slip.

For AI, those little disfluencies can cause big problems.
That’s why my research builds methods to make spoken language systems more robust.

#SpeechProcessing #ConversationalAI #NLP #AI

04.10.2025 18:15 — 👍 1 🔁 0 💬 0 📌 0

📄 Paper: mariateleki.github.io/pdf/A_Survey... | 💻 GitHub: github.com/mariateleki/...

#StoryGeneration #GenerativeAI #NLProc

27.09.2025 18:15 — 👍 1 🔁 0 💬 0 📌 0

Back in August, we shared our Survey on LLMs for Story Generation (EMNLP Findings 2025).

📚 Covers: controllability, coherence, and creativity
🧩 Discusses: evaluation challenges
🌍 Highlights: hybrid symbolic–neural approaches
💻 Includes: an open resource list (PRs welcome!)

27.09.2025 18:15 — 👍 0 🔁 0 💬 1 📌 0

#arxiv #speechAI #LLM

25.09.2025 18:25 — 👍 1 🔁 0 💬 0 📌 0

🧭 9 actionable recommendations for deployment -- some are surprising! 😱
📄 Paper: arxiv.org/pdf/2509.20321
💻 Code: github.com/mariateleki/...

25.09.2025 17:40 — 👍 1 🔁 0 💬 1 📌 0

🔬 DRES provides the first controlled benchmark for evaluating LLMs on disfluency removal.
✅ Controlled evaluation on gold transcripts (no ASR noise) sets an upper bound
📊 Systematic comparison across open & proprietary LLMs
🧪 First taxonomy of LLM error modes

25.09.2025 17:40 — 👍 1 🔁 0 💬 1 📌 0

🚀 New on arXiv: We introduce DRES, the disfluency removal evaluation suite!

25.09.2025 17:40 — 👍 1 🔁 0 💬 1 📌 0

Z-Scores: A Metric for Linguistically Assessing Disfluency Removal Evaluating disfluency removal in speech requires more than aggregate token-level scores. Traditional word-based metrics such as precision, recall, and F1 (E-Scores) capture overall performance but…

🔎 Z-Scores reveal model weaknesses by disfluency type — EDITED, INTJ, and PRN — providing diagnostic insights that guide targeted improvements.

📄 Paper: arxiv.org/abs/2509.20319
💻 Code: github.com/mariateleki/...

25.09.2025 17:22 — 👍 1 🔁 0 💬 0 📌 0

🌟 New on arXiv — we introduce Z-Scores: A Metric for Linguistically Assessing Disfluency Removal 📊🧠

🤔 Traditional F1 scores hide why disfluency removal models succeed or fail.

25.09.2025 17:22 — 👍 1 🔁 0 💬 1 📌 0

🌱 The takeaway: model selection is critical for real-world conversational AI.

📄 Full paper & code: mariateleki.github.io/pdf/HorrorCo...

#SpeechProcessing #ConversationalAI #VoiceAI #Disfluency #SpokenLanguage
#INTERSPEECH #ICASSP

24.09.2025 17:06 — 👍 1 🔁 0 💬 0 📌 0

We tested multiple LLMs as CRS backbones and found:

🌟 Llama and Mixtral became more resilient with these errors
🌟 Gemini, GPT-4o, GPT-4o-mini performance dropped

Disfluencies aren’t just noise — they can even help certain models by introducing genre diversity.

24.09.2025 17:06 — 👍 1 🔁 0 💬 1 📌 0

Speech is messy — and so are recommender systems when they face speech errors.

In our INTERSPEECH 2025 paper, we introduced Syn-WSSE, a psycholinguistically grounded framework for simulating whole-word substitution errors in conversational recommenders (e.g., “I want a horror—comedy movie”).

24.09.2025 17:06 — 👍 1 🔁 0 💬 1 📌 0

📄 Paper/code: www.isca-archive.org/interspeech_...

#SpeechProcessing #ConversationalAI #VoiceAI #Disfluency #SpokenLanguage

21.09.2025 19:00 — 👍 2 🔁 0 💬 0 📌 0

Messy, real-world speech ≠ clean transcripts.

In our #INTERSPEECH2024 paper, we compared Google ASR vs WhisperX on 82k+ podcasts 🎙️

🌱 WhisperX → better with accurately transcribing “uh/um”
🌱 Google ASR → better with accurately transcribing edited nodes
🌱 Which to use? Depends on your data.

21.09.2025 19:00 — 👍 1 🔁 0 💬 1 📌 0

Maria Teleki

Latest posts by mariateleki.bsky.social on Bluesky

@mariateleki is following 20 prominent accounts