2. A proposal for evaluation of "superhuman" systems in healthcare: ai.nejm.org/doi/full/10....
04.12.2025 14:10 β π 1 π 0 π¬ 0 π 0@daniellebitterman.bsky.social
I'm a physician-scientist working in clinical NLP and LLM safety/evaluation. You'll find me in the lab or the rad onc clinic | BWH | DFCI | Harvard Medical School www.bittermanlab.org
2. A proposal for evaluation of "superhuman" systems in healthcare: ai.nejm.org/doi/full/10....
04.12.2025 14:10 β π 1 π 0 π¬ 0 π 0Check out our related work:
1. Gaps in ability of models to adjust response and differential diagnosis for cancer patients: www.thelancet.com/journals/lan...
@shan23chen.bsky.social
Our research has found that even when chatbots are given specific patient context, they often drift back toward generic, "average patient" responses. They see the data, but they don't always weigh it like a physician would.
04.12.2025 14:10 β π 1 π 0 π¬ 1 π 0As I shared in the NYT, models often see the data but fail to weigh it like a physician, drifting toward generic "average patient" responses. Context window β Clinical reasoning.
www.nytimes.com/2025/12/03/w...
ββJust because youβre providing all of this information to language models,β @daniellebitterman.bsky.social says, βdoesn't mean they're effectively using that info in the same way that a physician wouldβ.
And once people upload this kind of data, they have limited control over how it is used.βπ§ͺπ
Check out our editorial on Zazzetti et al (2025)'s paper on synthetic data generation for breast cancer, in JCO CCI! Synthetic data could help with many gaps in clinical AI research, but challenges remain especially (IMO) issues with out-of-domain generalization @shan23chen.bsky.social
30.11.2025 17:37 β π 3 π 1 π¬ 0 π 0Super proud of @shan23chen.bsky.social for his podium presentation on his research into LLM sycophancy in the face of illogical medical queries at #AMIA25!
Full paper: www.nature.com/articles/s41...
Also cited yesterday in the NYT! www.nytimes.com/2025/11/16/w...
LLMs tend to prioritize helpfulness > reason. We show that safety-aware, compute-efficient fine-tuning helps models reason more critically in healthcare domain, and generalizes to improved safety alignment across other domains.
www.nature.com/articles/s41... @shan23chen.bsky.social
An overemphasis on helpfulness makes LLMs vulnerable.
Research shows models will comply with illogical medical requests, generating false information. This sycophantic tendency can be corrected with specific prompting and fine-tuning. #MedSky #MedAI #MLSky
Mass General physician-scientist @daniellebitterman.bsky.social discusses how AI assists the clinical data pipeline leading to better treatments for patients. Listen to unNatural Selection & register for #WMIF2025 at the link in bio to hear more : www.unnaturalselection.net/podcast/s1e19
#MedTech
Our paper on multilingual reasoning is accepted to Findings of #EMNLP2025! π (OA: 3/3/3.5/4)
We show SOTA LMs struggle with reasoning in non-English languages; prompt-hack & post-training improve alignment but trade off accuracy.
π arxiv.org/abs/2505.22888
See you in Suzhou! #EMNLP
Are you driven to use AI to transform patient outcomes in oncology? My lab in the AI in Medicine Program (Mass General Brigham, Harvard Medical School) is seeking Postdoctoral Fellows to pioneer applications of AIβespecially LLMsβin cancer care. More here: www.linkedin.com/posts/daniel...
07.07.2025 12:22 β π 7 π 3 π¬ 1 π 0Reliability of Large Language Model Knowledge Across Brand and Generic Cancer Drug Names | JCO Clinical Cancer Informatics ascopubs.org/doi/abs/10.1... #JCOCCI @daniellebitterman.bsky.social
18.06.2025 17:15 β π 1 π 1 π¬ 0 π 0Does your LRM reason in your language? Check out new preprint led by β¨ @jiruiqi.bsky.social & @shan23chen.bsky.social. Implications for safety/human oversight & accuracy!
30.05.2025 16:24 β π 2 π 0 π¬ 0 π 0Led by @shan23chen.bsky.social!
22.05.2025 16:27 β π 1 π 0 π¬ 0 π 0Agents are all the rage and we need to track their abilities in the medical domain. Enter MedBrowseComp, the 1st benchmark to assess agents' abilities to reason, navigate the web, and search for verifiable med info!
Preprint: arxiv.org/abs/2505.14963
Site: moreirap12.github.io/mbc-browse-a...
"I think we have massive opportunity in cancer care to get patients to the right care, the most advanced care earlier, by taking those workforce shortages and using AI to get to solutions."
#STATBreakthrough
14.05.2025 21:41 β π 6 π 2 π¬ 2 π 0"The other thing I'm scared of, it's a patient's voice is going to be come lost in the conversation of on what type of AI is developed and how we implement it," Danielle
#STATBreakthrough
14.05.2025 21:56 β π 6 π 3 π¬ 0 π 0Iβm thrilled to be in San Francisco for @statnews.com's Breakthrough West Summit! Iβll be bringing my firsthand perspective as a physician-scientist to speak about how AI is transforming cancer care, alongside leaders in the field.
Let's connect if you're here!
#STATBreakthroughSummitWest
A social card that reads Featured Session: AI in Cancer Care. Then underneath are four headshots and titles. They read: Danielle Bitterman, M.D., Clifford A. Hudis, M.D., Karen Knudsen, Ph.D., and STAT's Angus Chen.
AI in Cancer Care
Artificial intelligence has the potential to upend oncology, changing everything from diagnosis to treatment options. Get a wide-ranging view of how the use of technology could play out over the next few years.
Moderated by @angusrohan.bsky.social
#STATBreakthrough
Exciting news: we are organizing a shared task β 2nd edition of the Chemotherapy Treatment Timelines Extraction from the Clinical Narrative (text mining task) -- collocated with the Clinical NLP Workshop. Do LLMs solve the task? Check out bit.ly/ChemoTimelin...
23.04.2025 22:59 β π 3 π 1 π¬ 0 π 0graph of NIH basisfor new drugs
A pie graph worth keeping in mind as the NIH budget plummets jamanetwork.com/journals/jam... for 356 new FDA drugs approved
23.03.2025 16:17 β π 4039 π 1652 π¬ 61 π 85Conference and professional societies: PLEASE make hybrid options available for attendees and presenters at your conferences so that scientists from HHS-funded agencies can attend. These are unmissable opportunities to promote all the great intramural science and scientists from our government.
18.02.2025 16:34 β π 4 π 0 π¬ 0 π 0My Perspective in @NEJM_AI. AI could distort clinical decision-making in ways that prioritize profit over patient care. Oversight & regulation must go beyond performance metrics alone to address hidden commercial forces that could shape decision support. ai.nejm.org/doi/full/10....
06.02.2025 16:14 β π 12 π 4 π¬ 2 π 0My opinion as an actual NIH-funded researcher (unlike Vinay) at ucsf: his lies about how NIH dollars are used reflect a complete lack of understanding of how research is performed, a lack of respect for research, and are harmful to the entire biomedical research enterprise #grifter
12.02.2025 02:23 β π 57 π 10 π¬ 2 π 0Budgeting for the next year of my grants and they will all need to be rescoped, even before the 15% IDC rate. NCI funding at 83% for new awards and another 10% reduction for renewals (current state). Essentially, we are getting 50% of what we asked for...how is this sustainable? @carlbergstrom.com
09.02.2025 15:38 β π 14 π 5 π¬ 1 π 2As a cancer doctor I see every day how NIH-funded clinical trials save lives and has made the U.S. a leader in medical innovation. Here's one example: In the 1970s, childhood cancer survival was only 58%. Today it is 85%, largely thanks to NIH/NCI funding of Children's Oncology Group trials.
05.02.2025 14:48 β π 21 π 14 π¬ 0 π 0Congressional delegation outside USAID now: βWe are here to shed a light on a crime unfolding before our eyes.β
03.02.2025 18:00 β π 35065 π 8907 π¬ 1047 π 608Senator Andy Kim just went to the USAID building, talked to the security guard there to confirm employees are being barred entry, and then did a press gaggle right there in front to call it out.
This is doing something. This is making an effort on messaging. Other Democratic lawmakers: take notes.