's Avatar

@kaijie-mo.bsky.social

NLP & Ling; Phd student @UTAustin @UT_Linguistics website: https://kaijiemo-kj.github.io/

14 Followers  |  6 Following  |  4 Posts  |  Joined: 21.01.2026  |  1.2192

Latest posts by kaijie-mo.bsky.social on Bluesky

Preview
Faithfulness vs. Safety: Evaluating LLM Behavior Under Counterfactual Medical Evidence In high-stakes domains like medicine, it may be generally desirable for models to faithfully adhere to the context provided. But what happens if the context does not align with model priors or safety ...

πŸ“ŽPaper: arxiv.org/abs/2601.11886
πŸ§‘β€πŸ’»Code/data: github.com/KaijieMo-kj/...

w/
@kaijie-mo.bsky.social @sidvenkatayogi.bsky.social
@chantalsh.bsky.social @ramezkouzy.bsky.social
@cocoweixu.bsky.social @byron.bsky.social @jessyjli.bsky.social

21.01.2026 19:07 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

Results (3/4)
– With evidence, models strongly adhere to it with high confidence, even for toxic or nonsensical interventions
– Implausibility awareness is transient; once evidence appears, models rarely flag problems
– Scaling, medical fine-tuning, and skeptical prompting offer little protection

21.01.2026 18:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Setup (2/4)
We introduce MedCounterFact, a counterfactual medical QA dataset built on RCT-based evidence synthesis.
– Replace real interventions in evidence with nonce, mismatched medical, non-medical, or toxic terms
– Evaluate 9 frontier LLMs under evidence-grounded prompts

21.01.2026 18:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

Hello world πŸ‘‹
My first paper at UT Austin!

We ask: what happens when medical β€œevidence” fed into an LLM is wrong? Should your AI stay faithful, or should it play it safe when the evidence is harmful?

We show that frontier LLMs accept counterfactual medical evidence at face value.🧡

21.01.2026 18:45 β€” πŸ‘ 14    πŸ” 6    πŸ’¬ 3    πŸ“Œ 2

@kaijie-mo is following 6 prominent accounts