Early struggles and rejection are normal in academia. Your value as a scientist is not about how quickly things happen, it is about your persistence and passion. Keep going πͺβ¨
Stay tuned: www.youtube.com/@WomeninAIRe...
#Science #Research #AcademicLife #WiAIRpodcast
11.08.2025 15:02 β π 0 π 0 π¬ 0 π 0
YouTube video by Women in AI Research WiAIR
LLM Hallucinations and Machine Unlearning, with Dr. Abhilasha Ravichander
π§ Donβt miss our conversation with Dr. Ravichander on WiAIR Podcast, where we explore the paperβs findings and its implications for trustworthy AI.β¨π¬ YouTube: www.youtube.com/watch?v=QPp0...
ποΈ Spotify: open.spotify.com/episode/7lGC...
π Apple: podcasts.apple.com/ca/podcast/l...
08.08.2025 16:50 β π 1 π 0 π¬ 1 π 0
β‘ Model Performance: Even the best models like GPT-4 still hallucinate, with error rates up to 86% in certain tasks.
08.08.2025 16:50 β π 0 π 0 π¬ 1 π 0
π Real-World Impact: Hallucinations can cause serious issues in applications like content generation, scientific discovery, and decision-making where accuracy is crucial.
08.08.2025 16:50 β π 0 π 0 π¬ 1 π 0
Key Takeaways from HALoGEN π:
π§ HALoGEN Benchmark: A comprehensive framework to evaluate hallucinations across 9 diverse domains, from scientific citations to programming.
π‘ Types of Hallucinations: Type A (misremembered data), Type B (misleading facts), Type C (fabrications)
08.08.2025 16:50 β π 0 π 0 π¬ 1 π 0
What are hallucinations in LLMs? They happen when AI models generate facts that misalign with the worldβs established knowledge or context, leading to false or fabricated information.
08.08.2025 16:49 β π 1 π 0 π¬ 1 π 0
This paper tackles the issue of hallucinations in large language models (LLMs)βwhere models produce misleading or inaccurate information. It's a critical problem in AI research. π§ π‘
08.08.2025 16:49 β π 1 π 0 π¬ 1 π 0
Weβre thrilled to congratulate Dr. Abhilasha Ravichander (@lasha.bsky.social) and her team for receiving the Outstanding Paper Award at #acl2025 for their work titled "HALoGEN: Fantastic LLM Hallucinations and Where to Find Them"! πβ¨
#ACL #LLMs #Hallucination #WiAIR #WomenInAI
08.08.2025 16:49 β π 15 π 2 π¬ 2 π 0
ποΈ New Women in AI Research episode out now!
We speak with Dr. Abhilasha Ravichander about:
β LLM hallucination types
β Benchmarks WildHallucinations & HALoGEN
β Machine unlearning + memorization probes
β Responsible AI & transparent systems
π§ Listen here: youtu.be/QPp0cJNBbL8
#LLMHallucinations
06.08.2025 15:30 β π 1 π 0 π¬ 0 π 0
π€ Can LLMs know when to be factual vs. creative?
ποΈ In this short clip, Abhilasha Ravichander explores one of the hardest challenges in LLM behavior: adaptability.
π
Full #WiAIRpodcast episode drops in 2 days!
04.08.2025 15:30 β π 1 π 0 π¬ 0 π 0
Women in AI Research WiAIR
Women in AI Research (WiAIR) is a podcast dedicated to celebrating the remarkable contributions of female AI researchers from around the globe. Our mission is to challenge the prevailing perception th...
π§ Episode out Aug 6.
Tune in to #WiAIRpodcast for critical conversations shaping the future of AI research:
π¬ YouTube: youtube.com/@WomeninAIRe...
ποΈ Spotify: open.spotify.com/show/51RJNlZ...
π Apple: podcasts.apple.com/ca/podcast/w...
4/
01.08.2025 14:35 β π 0 π 0 π¬ 0 π 0
We also explore:
π§ Tackling model memorization
π Pushing for data transparency
π οΈ Building tools for machine unlearning
π Advice on navigating academic transitions
3/
01.08.2025 14:35 β π 1 π 0 π¬ 1 π 0
π HALoGEN β a benchmark for detecting hallucinations in LLMs β just won the Outstanding Paper Award @aclmeeting.bsky.social in Vienna.
We unpack the challenges of evaluating hallucinations, and how factuality benchmarks can guide better LLM assessment.
2/
#ACL2025NLP #ACL2025
01.08.2025 14:35 β π 0 π 0 π¬ 1 π 0
Speaker announcement: the new episode of the Women in AI Research WiAIR podcast is out on August 6th. Our guest is Dr. Abhilasha Ravichander, a postdoc at University of Washington and Assistant Professor at Max Planck Institute for Software Systems.
ποΈ New Women in AI Research #WiAIR episode coming Aug 6!
We talk to @lasha.bsky.social about LLM Hallucination, her award-winning HALoGEN benchmark, and how we can better evaluate hallucinations in language models.
π Whatβs inside:
1/
01.08.2025 14:35 β π 2 π 1 π¬ 1 π 0
Our guest Dieuwke Hupkes questions the lack of accountability in academic peer review.
Why do bad reviewers get off the hook? Could consequences - like limiting their ability to submit - actually improve the system?
What's your take on this?
30.07.2025 15:02 β π 0 π 0 π¬ 0 π 0
YouTube video by Women in AI Research WiAIR
Generalization in AI, with Dr. Dieuwke Hupkes
π§ Hear Dieuwke Hupkes on why scaling laws differ for knowledge and reasoning in LLMs.
π¬ YouTube: www.youtube.com/watch?v=CuTW...
ποΈ Apple: podcasts.apple.com/ca/podcast/g...
π§ Spotify: open.spotify.com/show/51RJNlZ...
π Paper: arxiv.org/abs/2503.10061
#WiAIR #AIResearch #Reasoning #WomanInAI
28.07.2025 16:10 β π 1 π 0 π¬ 0 π 0
π Why it matters:
Computeβoptimal training depends on the skills you care about.
Careful datamix calibration & validation design are essential to train LLMs that perform well across both knowledge & reasoning. (7/8)
28.07.2025 16:08 β π 0 π 0 π¬ 1 π 0
3οΈβ£ Validation Sensitivity
The validation set you choose matters.
At small compute scales, the optimal parameter count can shift by ~50% depending on validation design. Even at large scales, >10% variation remains. (6/8)
28.07.2025 16:08 β π 0 π 0 π¬ 1 π 0
2οΈβ£ Beyond the Datamix
Even after balancing data proportions, knowledge vs. code diverge.
π Knowledge keeps demanding more parameters, while code scales better with more data. (5/8)
28.07.2025 16:08 β π 0 π 0 π¬ 1 π 0
1οΈβ£ SkillβDependent Optima
- Knowledge QA β capacityβhungry (needs more parameters)
- Code β dataβhungry (benefits more from tokens)
Scaling laws canβt be captured by a single curve. (4/8)
28.07.2025 16:07 β π 0 π 0 π¬ 1 π 0
π οΈ The study:
Experiments across 9 compute levels & 19 datasets, comparing two skill categories:
- Knowledgeβbased QA
- Code generation (as a proxy for reasoning)
Findings reveal fundamental differences in scaling behavior. (3/8)
28.07.2025 16:07 β π 0 π 0 π¬ 1 π 0
π The problem:
Scaling laws guide LLM training by trading off model size & data under fixed compute. But computeβoptimal scaling is usually measured via aggregate validation loss.
What happens when we zoom in on specific skills? (2/8)
28.07.2025 16:06 β π 0 π 0 π¬ 1 π 0
Trust in AI starts with generalization.
Dieuwke Hupkes (#MetaAI) explains why it's critical to know when you can count on your model, and when to be cautious.
Full episode:
π₯ Youtube: youtu.be/CuTWIW1JcsA?...
ποΈ Spotify: open.spotify.com/episode/0KSR...
#LLMs #TrustworthyAI
25.07.2025 16:03 β π 0 π 0 π¬ 0 π 0
From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
Abstract. The staggering pace with which the capabilities of large language models (LLMs) are increasing, as measured by a range of commonly used natural language understanding (NLU) benchmarks, raise...
π§ In our discussion, Dieuwke Hupkes reflects on model behavior, philosophical roots, and the importance of cross-form consistency in language evaluation.
π½οΈ YouTube: www.youtube.com/watch?v=CuTW...
π§ Spotify: open.spotify.com/show/51RJNlZ...
π§ Apple Podcasts: podcasts.apple.com/ca/podcast/w...
(8/8)
23.07.2025 16:05 β π 1 π 0 π¬ 1 π 0
Multisense Consistency does not focus on correctness alone. It probes semantic stabilityβwhether a model preserves meaning across variation.
This reframes evaluation toward robustness, especially in multilingual and generalization settings. (7/8)
23.07.2025 16:04 β π 0 π 0 π¬ 1 π 0
They identify two sources of inconsistency:
π§ Task misinterpretation due to form change
βοΈ Failure to apply consistent logic across inputs
These issues occur in both simple and complex tasks, despite semantic equivalence (6/8)
23.07.2025 16:03 β π 0 π 0 π¬ 1 π 0
Findings include:
β Frequent inconsistencies across reworded or translated inputs
π Lower consistency in non-English (especially low-resource) settings
π€― Sometimes incorrect answers were more stable than correct ones
These patterns challenge benchmark-based assumptions (5/8)
23.07.2025 16:03 β π 0 π 0 π¬ 1 π 0