π buff.ly/S0DSZzt
β½οΈ Xiangjue Dong (1st author), Cong Wang, Millenium Bismay, and James Caverlee
#NLP #NLPResearch #LLMs #GenAI #AI
@mariateleki.bsky.social
Howdy π€ | PhD in CS @ Texas A&M ποΈ #speech #AI #NLP #recsys πΆ Apolloβs human | πΆ Rowing to 1M meters π https://mariateleki.github.io/
π buff.ly/S0DSZzt
β½οΈ Xiangjue Dong (1st author), Cong Wang, Millenium Bismay, and James Caverlee
#NLP #NLPResearch #LLMs #GenAI #AI
π To me, this work is super exciting because we take a totally different perspective: we show that β¬οΈ diverse perspectives, β¬οΈ system performance, so β¬οΈ $$$ for a company! With this work, we argue that <<< π¨ diverse perspectives are absolutely necessary >>> from an economic standpoint.
11.11.2025 16:38 β π 0 π 0 π¬ 1 π 0You always hear about the "bias-accuracy tradeoff," meaning that β¬οΈ model bias, β¬οΈ system performance, so β¬οΈ $$$ for a company. So much of the conversation around bias and diversity has focused on how to incentivize companies to debias their models (e.g., through new legislation).
11.11.2025 16:38 β π 0 π 0 π¬ 1 π 0In our new work, πΆ CHOIR: Collaborative Harmonization fOr Inference Robustness, we show that different LLM personas often get different benchmark questions right! CHOIR leverages this diversity to boost performance across benchmarks. π
11.11.2025 16:38 β π 0 π 0 π¬ 1 π 0LLMs can write β but can they tell stories?
Our survey shows they struggle with:
β οΈ Long-term coherence
β οΈ Controllability
π Paper: mariateleki.github.io/pdf/A_Survey...
#StoryGeneration #GenerativeAI #NLP
Choosing an ASR system isnβt one-size-fits-all β it depends on the disfluencies in your domain.
π www.isca-archive.org/interspeech_...
Disfluencies arenβt just noise β theyβre part of how we speak.
In our #INTERSPEECH2024 paper, we looked at how Google ASR vs WhisperX handle messy, real-world podcasts (82k+ episodes!):
ποΈ WhisperX β better with βuh/umβ
π Google ASR β better with self-corrections
Highlight of my PhD β mentoring students. It's literally just the most fun to brainstorm with them each week and watch them learn and grow π±
#AcademicMentoring #PhDLife
We canβt fix what we donβt measure.
Thatβs why I build evaluation frameworks for speech & conversational AI β so we can stress-test systems against real-world variability.
#AIResearch #Evaluation #SpeechProcessing
Iβm working on methods & evaluation frameworks for conversational AI that are:
β
Robust to disfluencies
β
Reliable in noisy, real-world conditions
β
Generalizable across contexts
If conversational AI is going to truly work for everyone, it must be built for human speech as it is.
Why βspeech-firstβ AI?
Because speech β text.
ποΈ People pause, restart, self-correct
π Background noise & accents vary
π¬ Context shifts across domains
What happens when you say:
βI want a horror -- comedy -- movieβ? π₯
That slip-of-the-tongue can confuse recommender systems.
Our INTERSPEECH 2025 paper shows some LLMs handle it better than others.
π mariateleki.github.io/pdf/HorrorCo...
#INTERSPEECH2025 #ConversationalAI #RecSys
These insights still apply to anyone working on conversational AI, spoken summarization, or voice-driven interfaces today.
π Read more: www.isca-archive.org/interspeech_...
#SpeechProcessing #ConversationalAI #VoiceAI #Disfluency #SpokenLanguage
#INTERSPEECH
We compared two systems on 82,000+ podcast episodes. We found:
π WhisperX better captures interjections like βuhβ and βumβ
π Google ASR better captures edited nodes (e.g., βletβs go to Target--Walmartβ)
π The type of disfluency matters when choosing an ASR system
Last year at INTERSPEECH 2024, we explored a question that remains relevant: how do ASR systems handle disfluencies in real-world speech?
15.10.2025 17:05 β π 0 π 0 π¬ 1 π 0Stories shape how we think and connect. π
But can AI tell a good story?
Our Survey on LLMs for Story Generation (EMNLP Findings 2025) explores:
β¨ Coherence
ποΈ Controllability
π¨ Creativity
βοΈ Authenticity
π mariateleki.github.io/pdf/A_Survey...
#StoryGeneration #GenerativeAI
Speech isnβt perfect.
We restart, repeat, and slip.
For AI, those little disfluencies can cause big problems.
Thatβs why my research builds methods to make spoken language systems more robust.
#SpeechProcessing #ConversationalAI #NLP #AI
π Paper: mariateleki.github.io/pdf/A_Survey... | π» GitHub: github.com/mariateleki/...
#StoryGeneration #GenerativeAI #NLProc
Back in August, we shared our Survey on LLMs for Story Generation (EMNLP Findings 2025).
π Covers: controllability, coherence, and creativity
π§© Discusses: evaluation challenges
π Highlights: hybrid symbolicβneural approaches
π» Includes: an open resource list (PRs welcome!)
#arxiv #speechAI #LLM
25.09.2025 18:25 β π 1 π 0 π¬ 0 π 0 π§ 9 actionable recommendations for deployment -- some are surprising! π±
π Paper: arxiv.org/pdf/2509.20321
π» Code: github.com/mariateleki/...
π¬ DRES provides the first controlled benchmark for evaluating LLMs on disfluency removal.
β
Controlled evaluation on gold transcripts (no ASR noise) sets an upper bound
π Systematic comparison across open & proprietary LLMs
π§ͺ First taxonomy of LLM error modes
π New on arXiv: We introduce DRES, the disfluency removal evaluation suite!
25.09.2025 17:40 β π 1 π 0 π¬ 1 π 0π Z-Scores reveal model weaknesses by disfluency type β EDITED, INTJ, and PRN β providing diagnostic insights that guide targeted improvements.
π Paper: arxiv.org/abs/2509.20319
π» Code: github.com/mariateleki/...
π New on arXiv β we introduce Z-Scores: A Metric for Linguistically Assessing Disfluency Removal ππ§
π€ Traditional F1 scores hide why disfluency removal models succeed or fail.
π± The takeaway: model selection is critical for real-world conversational AI.
π Full paper & code: mariateleki.github.io/pdf/HorrorCo...
#SpeechProcessing #ConversationalAI #VoiceAI #Disfluency #SpokenLanguage
#INTERSPEECH #ICASSP
We tested multiple LLMs as CRS backbones and found:
π Llama and Mixtral became more resilient with these errors
π Gemini, GPT-4o, GPT-4o-mini performance dropped
Disfluencies arenβt just noise β they can even help certain models by introducing genre diversity.
Speech is messy β and so are recommender systems when they face speech errors.
In our INTERSPEECH 2025 paper, we introduced Syn-WSSE, a psycholinguistically grounded framework for simulating whole-word substitution errors in conversational recommenders (e.g., βI want a horrorβcomedy movieβ).
π Paper/code: www.isca-archive.org/interspeech_...
#SpeechProcessing #ConversationalAI #VoiceAI #Disfluency #SpokenLanguage
Messy, real-world speech β clean transcripts.
In our #INTERSPEECH2024 paper, we compared Google ASR vs WhisperX on 82k+ podcasts ποΈ
π± WhisperX β better with accurately transcribing βuh/umβ
π± Google ASR β better with accurately transcribing edited nodes
π± Which to use? Depends on your data.