ππ₯³Had great fun doing this during my summer internship with folks from Apple (Yuan Zhang, Joel Ruben Antony Moniz, Xiou Ge, Bo-Hsiang Tseng, Dhivya Piraviperumal, Hong Yu) and USC (@swabhs.bsky.social)
Looking forward to the feedback! π
#LLMs #NLProc
(7/n)
30.04.2025 18:54 β π 0 π 0 π¬ 0 π 0
π«Bottom line: Thereβs no single metric that captures hallucinations reliably across the board.
π―Our work highlights the need for robust, context-aware, and generalizable hallucination detection tools as a prerequisite to meaningful mitigation.
(6/n)
30.04.2025 18:54 β π 0 π 0 π¬ 1 π 0
β
What works better?
Unsurprisingly, GPT4-based evaluators show the highest reliability with humans across settings π
Using ensembles of multiple metrics is a promising avenueβοΈ
Instruction tuning & mode-seeking decoding help reduce hallucinationsπ
(5/n)
30.04.2025 18:54 β π 0 π 0 π¬ 1 π 0
Our findings highlight:
β οΈMany existing metrics show poor alignment with human judgments
β οΈThe inter-metric correlation is also weak
β οΈThe show limited generalization across datasets, tasks, and models
β οΈThey do not consistent improvement with larger models
(4/n)
30.04.2025 18:54 β π 0 π 0 π¬ 1 π 0
π§Focusing on faithfulness and factuality errors in QA and dialogue tasks, we study diverse metrics spanning:
1. Syntactic and semantic similarity
2. Natural language inference
3. Multi-step question answering pipelines
4. Custom-trained models
5. SOTA LLMs as judge.
(3/n)
30.04.2025 18:54 β π 0 π 0 π¬ 1 π 0
π€Despite a surge in research on hallucination mitigation, few ask the critical questions:
1. Are the metrics capturing the hallucinations effectively?
2. Do they align with each other and the human notion of hallucination?
3. Do they generalize across different settings?
(2/n)
30.04.2025 18:54 β π 0 π 0 π¬ 1 π 0
Hallucinations in LLMs are realβand so are the problems with how we measure them π
Our latest work questions the generalizability of hallucination detection metrics across tasks, datasets, model sizes, training methods, and decoding strategies π₯
arxiv.org/abs/2504.18114
(1/n)
30.04.2025 18:54 β π 1 π 0 π¬ 1 π 0
Reasoning about the "why" behind user behavior can improve LLM personas! β¨π§ π
πExcited to share our new work: Improving LLM Personas via Rationalization with Psychological Scaffolds
π arxiv.org/abs/2504.17993
π§΅ (1/n)
29.04.2025 01:05 β π 14 π 4 π¬ 1 π 1
NLP grad students
Join the conversation
There's too many starter packs.
π Here's a list, mostly for NLP, ML, and related areas.
01.12.2024 03:05 β π 40 π 11 π¬ 3 π 2
#socalnlp is the biggest it's ever been in 2024! We have 3 poster sessions up from 2! How many years until it's a two-day event?? π€―
22.11.2024 21:50 β π 26 π 3 π¬ 1 π 0
Started a SoCal AI/ML/NLP researchers starter pack! It's a bit sparse right now, and perhaps more NLP heavy, but hey, nominate yourself and others! go.bsky.app/6QckPj9
19.11.2024 15:28 β π 43 π 8 π¬ 17 π 1
ππ»ββοΈππ»ββοΈ
19.11.2024 23:31 β π 1 π 0 π¬ 1 π 0
Hey John, thanks for starting this packet! Could you please add me as well?
18.11.2024 18:09 β π 0 π 0 π¬ 1 π 0
Can you please add me to the pack! Looking forward to interacting with everyone!
15.11.2024 06:59 β π 1 π 0 π¬ 1 π 0
Great initiative!! Can you please add me! Looking forward to interacting with everyone!!π―
15.11.2024 06:56 β π 0 π 0 π¬ 1 π 0
PhD-ing @ LTI, CMU; Intern @ NVIDIA. Doing Reasoning with Gen AI!
PhD Student at Carnegie Mellon University. Interested in the energy implications and impact of machine learning systems.
Prev: Northwestern University, Google, Meta.
PhD student @ LTI, CMU / working on text summarization / prev. at Bloomberg, Amazon, Microsoft Research
apratapa.xyz
MS in NLP (MIIS) @ LTI, CMU
https://dhruv0811.github.io/
PhD student @CMU LTI
NLP | IR | Evaluation | RAG
https://kimdanny.github.io
Knowledge Engineer @ Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
I make colorless green GPUs sleep brrriously. Computational phonology, morphology, language change models, speech/language technologies (especially for people with disabilities).
PhD student at CMU. I do research on applied NLP ("alignment", "synthetic data"). he/him
PhD student @ UWCSE; MLT @ CMU-LTI; Responsible AI
https://kshitishghate.github.io/
PhD student at CMU LTI; Interested in pragmatics and cross-cultural understanding;
intern @ Allen Institute for AI |Prev: Senior Research Engineer @ Samsung Research America | Masters @ Stanford
https://akhila-yerukola.github.io/
cs.cmu.edu/~zsheikh
Senior Research Programmer at Carnegie Mellon University
Associate professor at CMU, studying natural language processing and machine learning. Co-founder All Hands AI
she/her | Masterβs @ltiatcmu.bsky.social | ML @ Microsoft | ML @ Apple | Bachelorβs @ PES University | https://vibhamasti.github.io
PhD student pondering human-AI collaboration @CMU LTI π€·ββοΈπ€π€ (she/her)
Master's student @ltiatcmu.bsky.social, working on speech AI at @shinjiw.bsky.social