Maria Teleki's Avatar

Maria Teleki

@mariateleki.bsky.social

Howdy 🀠 | PhD in CS @ Texas A&M πŸŽ™οΈ #speech #AI #NLP #recsys 🐢 Apollo’s human | πŸ›Ά Rowing to 1M meters 🌐 https://mariateleki.github.io/

58 Followers  |  319 Following  |  38 Posts  |  Joined: 22.06.2025  |  2.1695

Latest posts by mariateleki.bsky.social on Bluesky

πŸ“„ buff.ly/S0DSZzt
⚽️ Xiangjue Dong (1st author), Cong Wang, Millenium Bismay, and James Caverlee
#NLP #NLPResearch #LLMs #GenAI #AI

11.11.2025 16:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

🌟 To me, this work is super exciting because we take a totally different perspective: we show that ⬆️ diverse perspectives, ⬆️ system performance, so ⬆️ $$$ for a company! With this work, we argue that <<< 🚨 diverse perspectives are absolutely necessary >>> from an economic standpoint.

11.11.2025 16:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

You always hear about the "bias-accuracy tradeoff," meaning that ⬇️ model bias, ⬇️ system performance, so ⬇️ $$$ for a company. So much of the conversation around bias and diversity has focused on how to incentivize companies to debias their models (e.g., through new legislation).

11.11.2025 16:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In our new work, 🎢 CHOIR: Collaborative Harmonization fOr Inference Robustness, we show that different LLM personas often get different benchmark questions right! CHOIR leverages this diversity to boost performance across benchmarks. πŸ“Š

11.11.2025 16:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

LLMs can write β€” but can they tell stories?

Our survey shows they struggle with:
⚠️ Long-term coherence
⚠️ Controllability

πŸ“š Paper: mariateleki.github.io/pdf/A_Survey...

#StoryGeneration #GenerativeAI #NLP

08.11.2025 19:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Choosing an ASR system isn’t one-size-fits-all β€” it depends on the disfluencies in your domain.

πŸ“„ www.isca-archive.org/interspeech_...

03.11.2025 18:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Disfluencies aren’t just noise β€” they’re part of how we speak.

In our #INTERSPEECH2024 paper, we looked at how Google ASR vs WhisperX handle messy, real-world podcasts (82k+ episodes!):

πŸŽ™οΈ WhisperX β†’ better with β€œuh/um”
πŸ“ Google ASR β†’ better with self-corrections

03.11.2025 18:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Highlight of my PhD β†’ mentoring students. It's literally just the most fun to brainstorm with them each week and watch them learn and grow 🌱

#AcademicMentoring #PhDLife

01.11.2025 18:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We can’t fix what we don’t measure.

That’s why I build evaluation frameworks for speech & conversational AI β€” so we can stress-test systems against real-world variability.

#AIResearch #Evaluation #SpeechProcessing

25.10.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I’m working on methods & evaluation frameworks for conversational AI that are:
βœ… Robust to disfluencies
βœ… Reliable in noisy, real-world conditions
βœ… Generalizable across contexts

If conversational AI is going to truly work for everyone, it must be built for human speech as it is.

20.10.2025 18:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Why β€œspeech-first” AI?
Because speech β‰  text.

πŸŽ™οΈ People pause, restart, self-correct
🌎 Background noise & accents vary
πŸ’¬ Context shifts across domains

20.10.2025 18:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

What happens when you say:
β€œI want a horror -- comedy -- movie”? πŸŽ₯

That slip-of-the-tongue can confuse recommender systems.
Our INTERSPEECH 2025 paper shows some LLMs handle it better than others.

πŸ“„ mariateleki.github.io/pdf/HorrorCo...

#INTERSPEECH2025 #ConversationalAI #RecSys

18.10.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

These insights still apply to anyone working on conversational AI, spoken summarization, or voice-driven interfaces today.

πŸ“„ Read more: www.isca-archive.org/interspeech_...

#SpeechProcessing #ConversationalAI #VoiceAI #Disfluency #SpokenLanguage
#INTERSPEECH

15.10.2025 17:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We compared two systems on 82,000+ podcast episodes. We found:
πŸ‘‰ WhisperX better captures interjections like β€œuh” and β€œum”
πŸ‘‰ Google ASR better captures edited nodes (e.g., β€œlet’s go to Target--Walmart”)

🌟 The type of disfluency matters when choosing an ASR system

15.10.2025 17:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Last year at INTERSPEECH 2024, we explored a question that remains relevant: how do ASR systems handle disfluencies in real-world speech?

15.10.2025 17:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Stories shape how we think and connect. πŸ“–
But can AI tell a good story?

Our Survey on LLMs for Story Generation (EMNLP Findings 2025) explores:
✨ Coherence
πŸŽ›οΈ Controllability
🎨 Creativity
βš–οΈ Authenticity

πŸ“„ mariateleki.github.io/pdf/A_Survey...

#StoryGeneration #GenerativeAI

11.10.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Speech isn’t perfect.
We restart, repeat, and slip.

For AI, those little disfluencies can cause big problems.
That’s why my research builds methods to make spoken language systems more robust.

#SpeechProcessing #ConversationalAI #NLP #AI

04.10.2025 18:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0


πŸ“„ Paper: mariateleki.github.io/pdf/A_Survey... | πŸ’» GitHub: github.com/mariateleki/...

#StoryGeneration #GenerativeAI #NLProc

27.09.2025 18:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Back in August, we shared our Survey on LLMs for Story Generation (EMNLP Findings 2025).

πŸ“š Covers: controllability, coherence, and creativity
🧩 Discusses: evaluation challenges
🌍 Highlights: hybrid symbolic–neural approaches
πŸ’» Includes: an open resource list (PRs welcome!)

27.09.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

#arxiv #speechAI #LLM

25.09.2025 18:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

🧭 9 actionable recommendations for deployment -- some are surprising! 😱
πŸ“„ Paper: arxiv.org/pdf/2509.20321
πŸ’» Code: github.com/mariateleki/...

25.09.2025 17:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ”¬ DRES provides the first controlled benchmark for evaluating LLMs on disfluency removal.
βœ… Controlled evaluation on gold transcripts (no ASR noise) sets an upper bound
πŸ“Š Systematic comparison across open & proprietary LLMs
πŸ§ͺ First taxonomy of LLM error modes

25.09.2025 17:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸš€ New on arXiv: We introduce DRES, the disfluency removal evaluation suite!

25.09.2025 17:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Z-Scores: A Metric for Linguistically Assessing Disfluency Removal Evaluating disfluency removal in speech requires more than aggregate token-level scores. Traditional word-based metrics such as precision, recall, and F1 (E-Scores) capture overall performance but…

πŸ”Ž Z-Scores reveal model weaknesses by disfluency type β€” EDITED, INTJ, and PRN β€” providing diagnostic insights that guide targeted improvements.

πŸ“„ Paper: arxiv.org/abs/2509.20319
πŸ’» Code: github.com/mariateleki/...

25.09.2025 17:22 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

🌟 New on arXiv β€” we introduce Z-Scores: A Metric for Linguistically Assessing Disfluency Removal πŸ“ŠπŸ§ 

πŸ€” Traditional F1 scores hide why disfluency removal models succeed or fail.

25.09.2025 17:22 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

🌱 The takeaway: model selection is critical for real-world conversational AI.

πŸ“„ Full paper & code: mariateleki.github.io/pdf/HorrorCo...

#SpeechProcessing #ConversationalAI #VoiceAI #Disfluency #SpokenLanguage
#INTERSPEECH #ICASSP

24.09.2025 17:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We tested multiple LLMs as CRS backbones and found:

🌟 Llama and Mixtral became more resilient with these errors
🌟 Gemini, GPT-4o, GPT-4o-mini performance dropped

Disfluencies aren’t just noise β€” they can even help certain models by introducing genre diversity.

24.09.2025 17:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Speech is messy β€” and so are recommender systems when they face speech errors.

In our INTERSPEECH 2025 paper, we introduced Syn-WSSE, a psycholinguistically grounded framework for simulating whole-word substitution errors in conversational recommenders (e.g., β€œI want a horrorβ€”comedy movie”).

24.09.2025 17:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ“„ Paper/code: www.isca-archive.org/interspeech_...

#SpeechProcessing #ConversationalAI #VoiceAI #Disfluency #SpokenLanguage

21.09.2025 19:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Messy, real-world speech β‰  clean transcripts.

In our #INTERSPEECH2024 paper, we compared Google ASR vs WhisperX on 82k+ podcasts πŸŽ™οΈ

🌱 WhisperX β†’ better with accurately transcribing β€œuh/um”
🌱 Google ASR β†’ better with accurately transcribing edited nodes
🌱 Which to use? Depends on your data.

21.09.2025 19:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@mariateleki is following 20 prominent accounts