statistically non-inferior to a single human expert (p < 0.001). Our benchmark provides evidence of LMs approaching expert-level ability in validating AI-generated medical text."
28.10.2025 00:05 β π 0 π 0 π¬ 0 π 0@jonc101.bsky.social
Physician Data Scientist - Stanford Center for Biomedical Informatics Research + Division of Hospital Medicine + Clinical Excellence Research Center + Biomedical Data Science
statistically non-inferior to a single human expert (p < 0.001). Our benchmark provides evidence of LMs approaching expert-level ability in validating AI-generated medical text."
28.10.2025 00:05 β π 0 π 0 π¬ 0 π 0(p < 0.001) alignment with physicians across seen and unseen tasks, increasing average F1 scores from 66% to 83%. Despite strong baseline performance, MedVAL improves the best-performing proprietary LM (GPT-4o) by 8% without training on physician-labeled data, demonstrating a performance
28.10.2025 00:05 β π 0 π 0 π¬ 1 π 0outputs. To evaluate LM performance, we introduce MedVAL-Bench, a dataset of 840 physician-annotated outputs across 6 diverse medical tasks capturing real-world challenges. Across 10 state-of-the-art LMs spanning open-source and proprietary models, MedVAL distillation significantly improves
28.10.2025 00:05 β π 0 π 0 π¬ 1 π 0To address these challenges, we propose MedVAL, a novel, self-supervised, data-efficient distillation method that leverages synthetic data to train evaluator LMs to assess whether LM-generated medical outputs are factually consistent with inputs, without requiring physician labels or reference
28.10.2025 00:05 β π 0 π 0 π¬ 1 π 0is challenging because 1) manual review is costly and 2) expert-composed reference outputs are often unavailable in real-world settings. While the "LM-as-judge" paradigm (a LM evaluating another LM) offers scalable evaluation, even frontier LMs can miss subtle but clinically significant errors.
28.10.2025 00:05 β π 0 π 0 π¬ 1 π 0Abstract: "With the growing use of language models (LMs) in clinical environments, there is an immediate need to evaluate the accuracy and safety of LM-generated medical text. Currently, such evaluation relies solely on manual physician review. However, detecting errors in LM-generated text
28.10.2025 00:05 β π 0 π 0 π¬ 1 π 0conditions using opportunistic imaging. Before joining Stanford, he completed a Masterβs in Electrical and Computer Engineering at UT Austin, where he worked on improving medical image reconstruction by learning priors from corrupted data, advised by Jon Tamir and Alex Dimakis.
28.10.2025 00:05 β π 0 π 0 π¬ 1 π 0AI and expert clinician-level performance. His recent projects focus on 1) improving LLMs as expert-level evaluators of AI-generated medical text, 2) improving robustness of language model benchmarks across diverse medical tasks using prompt optimization, and 3) detection of underdiagnosed medical
28.10.2025 00:05 β π 0 π 0 π¬ 1 π 0Bio: Asad is a research staff at Stanford, advised by Akshay Chaudhari. His research broadly focuses on developing machine learning methods for healthcare applications. More concretely, he is interested in building scalable, self-supervised methods to help bridge the gap between
28.10.2025 00:05 β π 0 π 0 π¬ 1 π 0@stanforddeptmed.bsky.social Biomedical Informatics Research Colloquia
βMedVAL: Toward Expert-Level Medical Text Validation with Language Modelsβ
Asad Aali, MS.
Thursday, October 30th, 2025
12:00 to 1:00 pm PST
stanford.zoom.us/j/9788759601...
Webinar ID: 978 8759 6012
Webinar Passcode: 420642
This talk will describe how Comet is trained across diverse health systems, what scaling reveals about generalization and medical reasoning, and how these capabilities can be applied to improve prediction, discovery, and patient outcomes in real-world settings."
21.10.2025 02:24 β π 0 π 0 π¬ 0 π 0Abstract: "Generative models have the potential to transform how health systems learn from data. Comet, Epicβs large-scale generative medical model, is designed to represent patient histories as sequences of clinical events, enabling reasoning about disease trajectories and care outcomes.
21.10.2025 02:24 β π 0 π 0 π¬ 1 π 0Bio: Software developer and lead of Comet team at Epic Systems.
21.10.2025 02:23 β π 0 π 0 π¬ 1 π 0Abstract: "The talk outlines how integrating rich clinical data with AIβespecially large language modelsβcan power βprecision educationβ that delivers individualized, outcome-driven learning and assessment across medical training and practice."
11.10.2025 16:21 β π 0 π 0 π¬ 0 π 0Jesse lives with his wife and two children in the Lower East Side of New York City.
11.10.2025 16:21 β π 0 π 0 π¬ 1 π 0and translational research, Jesse leads grant-funded studies exploring the intersection of medical education, informatics, and AI. His work aims to optimize trainee clinical performance and develop personalized educational interventions.
11.10.2025 16:20 β π 0 π 0 π¬ 1 π 0Bio: Jesse Burk-Rafel, an assistant professor of medicine at NYU Grossman School of Medicine, directs research at the NYU Institute for Innovations in Medical Education. He is also a hospitalist and inaugural Research Coach in the Division of Hospital Medicine. With a background in bioengineering
11.10.2025 16:20 β π 0 π 0 π¬ 1 π 0Jesse Burk-Rafel, MD, MRes.
@stanforddeptmed.bsky.social Biomedical Informatics Research Colloquia
βPrecision Education in the AI Eraβ
Jesse Burk-Rafel, MD, MRes.
Thursday, October 16th, 2025
12:00 to 1:00 pm PST
Live Stream
stanford.zoom.us/j/9788759601...
Webinar ID: 978 8759 6012
Webinar Passcode: 420642
determine when to trust AI autonomously, when human oversight is essential, and when to avoid AI entirely."
07.10.2025 00:02 β π 0 π 0 π¬ 0 π 0medical education landscape. Participants will learn to navigate the "Alignment Paradox"βensuring AI tools serve educational goals rather than undermine themβthrough an evidence-based decision framework. This practical approach, grounded in principles of AI performance patterns, helps educators
07.10.2025 00:02 β π 2 π 0 π¬ 1 π 0Abstract: "Artificial intelligence promises to revolutionize medical education, yet most institutions struggle to move beyond pilot projects to meaningful implementation. This talk bridges the gap between AI's potential and current reality by showcasing real-world applications from across the
07.10.2025 00:02 β π 0 π 0 π¬ 1 π 0keynotes, grand rounds, and workshops at leading institutions around the world. Her vision is to democratize access to individualized, mastery-based medical training by harnessing AI to scale feedback, foster equity, and capture the richness of clinical reasoning.
07.10.2025 00:02 β π 0 π 0 π¬ 1 π 0holds several patents pending on AI-driven educational platforms, and has been invited to contribute to advisory committees for the AAMC, AMA, ABMS, and the International Advisory Committee on AI in Health Professions Education. Widely recognized as a thought leader, Dr. Turner has delivered invited
07.10.2025 00:02 β π 0 π 0 π¬ 1 π 0initiatives to integrate AI responsibly into medical education. Her work focuses on leveraging multi-agent architectures, learning analytics, and adaptive assessment systems to advance precision medical education and reduce disparities in training. She has secured multiple competitive grants,
07.10.2025 00:02 β π 1 π 0 π¬ 1 π 0an educational technology company developing AI-powered platforms for personalized learning in healthcare. An interdisciplinary scholar with expertise in artificial intelligence, natural language processing, fuzzy logic, and educational informatics, Dr. Turner leads national and international
07.10.2025 00:02 β π 0 π 0 π¬ 1 π 0Bio: Dr. Laurah Turner, PhD is the Associate Dean for Artificial Intelligence and Educational Informatics and Associate Professor of Biostatistics, Health Informatics and Data Sciences and Medical Education at the University of Cincinnati College of Medicine. She is also co-founder of 2-Sigma,
07.10.2025 00:02 β π 0 π 0 π¬ 1 π 0Laurah Turner, PhD
@stanforddeptmed.bsky.social Biomedical Informatics Research Colloquia
βApplied Intelligence: Integrating AI Technologies Into Medical Educationβ
Laurah Turner, PhD.
Thursday, October 9th, 2025
12:00 to 1:00 pm PST
stanford.zoom.us/j/9788759601...
Webinar ID: 978 8759 6012
Passcode: 420642
Years ago, I led a National Academy of Medicine report chapter, where I called for the emphasis on what Computers and Humans are each especially good at. But..., what belongs in each column may need some rethinking. nam.edu/wp-content/u...
02.10.2025 23:29 β π 1 π 0 π¬ 0 π 0having a combination of ALL of these attributes in a single person (or entity) remains essential and potent.
02.10.2025 23:28 β π 0 π 0 π¬ 1 π 0Struggling in past year to articulate what good a human clinician/professional is good for anymore, because many things that were true a few years ago are actively being challenged. A dynamic space, but
02.10.2025 23:28 β π 0 π 0 π¬ 1 π 0