Dr. Matias Valdenegro mvaldenegro

How likely is ‘almost certainly’? The scourge of weasel words Phrases to describe probability are getting lost in translation — and have helped to cause at least one military catastrophe

I’m in The Times today talking about how we judge probability-based language and what happens when words mean different things to different people.

This follows an online quiz I’ve been running at probability.kucharski.io over the past few weeks, with 5000+ participants and counting.

24.02.2026 08:27 — 👍 33 🔁 10 💬 2 📌 0

Every American needs to watch this:

05.02.2026 20:41 — 👍 20566 🔁 10307 💬 582 📌 986

Adri\'an Detavernier, Jasper De Bock: Robustness quantification and how it allows for reliable classification, even in the presence of distribution shift and for small training sets https://arxiv.org/abs/2503.22418 https://arxiv.org/pdf/2503.22418 https://arxiv.org/html/2503.22418

31.03.2025 06:05 — 👍 0 🔁 2 💬 1 📌 1

Class 3: How to Identify Scientific Red Flags and Spot Misleading Health Headlines In this class we learn how to evaluate research quality, use Google Scholar to check expertise, and navigate common pitfalls in statistics and research methods.

📊 Class 3 of my online course is now live! This week we tackle scientific literacy—how to read research papers, evaluate expert credentials, and spot the red flags in misleading health headlines.
matthewfacciani.substack.com/p/class-3-ho...

18.01.2026 20:47 — 👍 14 🔁 6 💬 1 📌 0

Hack the planet!

16.09.2025 11:06 — 👍 2 🔁 3 💬 0 📌 0

No, you did not give those of us who happened to look like the people who bombed Pearl Harbor any due process. And that was profoundly wrong. It destroyed our lives.

08.08.2025 18:46 — 👍 13967 🔁 3954 💬 604 📌 211

Print screen of the first page of a paper pre-print titled "Rigor in AI: Doing Rigorous AI Work Requires a Broader, Responsible AI-Informed Conception of Rigor" by Olteanu et al. Paper abstract: "In AI research and practice, rigor remains largely understood in terms of methodological rigor -- such as whether mathematical, statistical, or computational methods are correctly applied. We argue that this narrow conception of rigor has contributed to the concerns raised by the responsible AI community, including overblown claims about AI capabilities. Our position is that a broader conception of what rigorous AI research and practice should entail is needed. We believe such a conception -- in addition to a more expansive understanding of (1) methodological rigor -- should include aspects related to (2) what background knowledge informs what to work on (epistemic rigor); (3) how disciplinary, community, or personal norms, standards, or beliefs influence the work (normative rigor); (4) how clearly articulated the theoretical constructs under use are (conceptual rigor); (5) what is reported and how (reporting rigor); and (6) how well-supported the inferences from existing evidence are (interpretative rigor). In doing so, we also aim to provide useful language and a framework for much-needed dialogue about the AI community's work by researchers, policymakers, journalists, and other stakeholders."

We have to talk about rigor in AI work and what it should entail. The reality is that impoverished notions of rigor do not only lead to some one-off undesirable outcomes but can have a deeply formative impact on the scientific integrity and quality of both AI research and practice 1/

18.06.2025 11:48 — 👍 63 🔁 18 💬 2 📌 3

We despise immigrants for not putting down roots, even as we make sure that it is impossible for them to do so. We do this because we have no idea what we want.

open.substack.com/pub/iandunt/...

16.05.2025 09:50 — 👍 2151 🔁 553 💬 103 📌 39

Why Tot Celebrity Ms. Rachel Waded Into the Gaza Debate

I'm embarrassed for the New York Times that they published this piece on Ms. Rachel, in which they cite a ridiculous anonymous righwing website Stopantisemitism while indulging the mad, mad claim she may be funded by Hamas (!).

This isn't journalism:

15.05.2025 16:07 — 👍 3192 🔁 503 💬 131 📌 88

Just out! Our peer-reviewed critique of the Cass Review has been published by BMC Medical Research Methodology. Please read and share. We show that the Cass Review is fatally flawed and should not be the basis for policy or practice in transgender healthcare.

link.springer.com/article/10.1...

10.05.2025 12:31 — 👍 5644 🔁 2869 💬 129 📌 225

Aleatoric and epistemic uncertainty are clear-cut concepts, right? ... right? 😵‍💫 In our new ICLR blogpost we let different schools of thought speak and contradict each other, and revisit chatbots where “the character of aleatory ‘transforms’ into epistemic” iclr-blogposts.github.io/2025/blog/re...

08.05.2025 08:18 — 👍 31 🔁 9 💬 1 📌 0

@bagleycartoons.bsky.social

06.05.2025 22:38 — 👍 141 🔁 59 💬 1 📌 2

Tips on How to Connect at Academic Conferences I was a kinda awkward teenager. If you are a CS researcher reading this post, then chances are, you were too. How to navigate social situations and make friends is not always intuitive, and has to …

I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them

I'm late for #ICLR2025 #NAACL2025, but in time for #AISTATS2025 #ICML2025! 1/3
kamathematics.wordpress.com/2025/05/01/t...

01.05.2025 12:57 — 👍 69 🔁 19 💬 3 📌 2

An AI Customer Service Chatbot Made Up a Company Policy—and Created a Mess When an AI model for code-editing company Cursor hallucinated a new rule, users revolted.

When an AI model for code-editing company Cursor hallucinated a new rule, users revolted. www.wired.com/story/cursor...

19.04.2025 15:54 — 👍 212 🔁 51 💬 11 📌 19

Even accepting the premise that AI produces useful writing (which no one should), using AI in education is like using a forklift at the gym. The weights do not actually need to be moved from place to place. That is not the work. The work is what happens within you.

15.04.2025 02:56 — 👍 10496 🔁 3371 💬 104 📌 270

A tweet by Sarah Longwell (@SarahLongwell25 ) reads: "He’s threatening media companies who are critical of him. He’s talking about sending Americans to foreign prisons. He’s signing executive orders to investigate former staff members who spoke out against him. Don’t you see what’s happening here?"

I see it. I have lived it. 83 years ago, the U.S. government turned upon a group of its own citizens and residents and sent them to internment camps without due process. I was there among them. American fascism is back. It is here. It is now.

15.04.2025 20:30 — 👍 45312 🔁 14445 💬 957 📌 481

Community for Rigor Reliable research can be complicated to create. So we made a network of essential resources to help you better understand the principles and practices of scientific rigor.Why trust us? Because we’re a...

So I am leading this group building great teaching materials for scientific rigor (c4r.io). Their first unit is really coming together and I will teach it (Monday, April 21, 2025, 12:00 -1:00pm EST) to see how well it works. Join us: forms.monday.com/forms/7d978e...

08.04.2025 23:04 — 👍 20 🔁 9 💬 0 📌 0

I've really enjoyed reading this "workography" by Kees van Deemter, whom I've never met but who has had a long career in NLP. Lots of storytelling and reflections on research, moving between institutions and countries, finding mentors, choosing between academia and industry, and more.

09.04.2025 09:34 — 👍 19 🔁 3 💬 0 📌 0

Calibrating Expressions of Certainty ArXiv link for Calibrating Expressions of Certainty

This study introduces a method for calibrating certainty expressions, transforming phrases like "Maybe" into probability distributions. This enhances decision-making for radiologists and fine-tunes AI models, improving uncertainty communication. https://arxiv.org/abs/2410.04315

03.04.2025 20:20 — 👍 3 🔁 1 💬 0 📌 1

A Gentle Introduction to Machine Learning Theory - Data Processing Club Share This PostWhen I first began learning about machine learning, I struggled with the notion that lowering the loss on the training data does not necessarily guarantee performance on the test data, ...

A Gentle Introduction to Machine Learning Theory! by Ryoma Sato

data-processing.club/theory/

04.04.2025 01:25 — 👍 37 🔁 7 💬 0 📌 0

How to Leverage Predictive Uncertainty Estimates for Reducing Catastrophic Forgetting in Online C...

Giuseppe Serra, Ben Werner, Florian Buettner

Action editor: Emmanuel Bengio

https://openreview.net/forum?id=dczXe0S1oL

#forgetting #memory #forget

02.04.2025 00:07 — 👍 3 🔁 1 💬 0 📌 0

March 31st is Trans Day of Visibility.

31.03.2025 21:07 — 👍 714 🔁 221 💬 7 📌 7

Enjoying this game very much!

26.03.2025 23:38 — 👍 1 🔁 0 💬 0 📌 0

despite popularised beliefs, LLMs are not fit for medical applications. SoTA models produce "non-trivial levels of hallucinations" even w inference techniques like CoT & search augmented generation: arxiv.org/pdf/2503.05777

of surveyed clinicians, 53% use LLMs daily & 91% encountered hallucinations

22.03.2025 19:08 — 👍 304 🔁 120 💬 13 📌 21

While reading Ben Recht's article, I found Foster & Hart (2021) (arxiv.org/abs/2210.07169) quite interesting. The contribution is a proposal of always-calibrated forecaster based on a continuously-relaxed calibration measure. But I actually love their §1.1 motivating calibration.

22.03.2025 00:03 — 👍 5 🔁 1 💬 0 📌 0

LinkedIn This link will take you to a page that’s not on LinkedIn

📣 New paper! The field of AI research is increasingly realising that benchmarks are very limited in what they can tell us about AI system performance and safety. We argue and lay out a roadmap toward a *science of AI evaluation*: arxiv.org/abs/2503.05336 🧵

20.03.2025 13:28 — 👍 38 🔁 12 💬 1 📌 1

Medical Hallucinations in Foundation Models and Their Impact on Healthcare Foundation Models that are capable of processing and generating multi-modal data have transformed AI's role in medicine. However, a key limitation of their reliability is hallucination, where inaccura...

91% of medical professionals using LLMs have encountered hallucinations and 84% believe they could impact patient health arxiv.org/abs/2503.05777

19.03.2025 23:09 — 👍 38 🔁 9 💬 1 📌 2

Germany Tried to Silence Me, a UN Official, for Talking About Israel’s Genocidal War in Gaza Francesca Albanese on her five-day trip that exposed Germany's harsh deviation from democratic values and shrinking landscape for freedom of expression.

"Germany Tried to Silence Me, a UN Official, for Talking About Israel’s Genocidal War in Gaza"

In an exclusive piece for Zeteo, UN Special Rapporteur Francesca Albanese writes about her 5-day trip that exposed Germany's harsh deviation from democratic values:

19.03.2025 17:56 — 👍 1447 🔁 420 💬 36 📌 15

Docs is an open source collaborative text editor created by a joint effort from the French 🇫🇷 and German 🇩🇪governments.
github.com/suitenumeriq...

17.03.2025 08:03 — 👍 26 🔁 1 💬 0 📌 0

Posts by Dr. Matias Valdenegro (@mvaldenegro.bsky.social)