Results (3/4)
β With evidence, models strongly adhere to it with high confidence, even for toxic or nonsensical interventions
β Implausibility awareness is transient; once evidence appears, models rarely flag problems
β Scaling, medical fine-tuning, and skeptical prompting offer little protection
21.01.2026 18:50 β π 1 π 0 π¬ 0 π 0
Setup (2/4)
We introduce MedCounterFact, a counterfactual medical QA dataset built on RCT-based evidence synthesis.
β Replace real interventions in evidence with nonce, mismatched medical, non-medical, or toxic terms
β Evaluate 9 frontier LLMs under evidence-grounded prompts
21.01.2026 18:47 β π 1 π 0 π¬ 0 π 0
Hello world π
My first paper at UT Austin!
We ask: what happens when medical βevidenceβ fed into an LLM is wrong? Should your AI stay faithful, or should it play it safe when the evidence is harmful?
We show that frontier LLMs accept counterfactual medical evidence at face value.π§΅
21.01.2026 18:45 β π 14 π 6 π¬ 3 π 2