Maybe my personal message didn't reach you.
With a quick numerical test, I recover confirmation bias in both the halves when fitting to Bayes-optimal behavior. Therefore, I do not think that the schematic is accurate.
I can share my code.
@prakhargodara.bsky.social
Physics PhD, now exploring questions involving learning and decision-making. Postdoc at NYU. Curious and open to chats.
Maybe my personal message didn't reach you.
With a quick numerical test, I recover confirmation bias in both the halves when fitting to Bayes-optimal behavior. Therefore, I do not think that the schematic is accurate.
I can share my code.
I am not sure this eliminates the possibility of temporal variation in learning rates.
Are you saying that had the learning rates been decaying with time, we would not have observed this effect?
Generally, I am empathetic to the idea that humans are biased. My only concern is that the arguments, as they stand, are not in their most robust form. I've responded below to highlight some of my concerns.
20.10.2025 15:19 — 👍 0 🔁 0 💬 0 📌 0From what I recall from my RLDM chat, the confidence estimates were self-reported. Correct?
If so, it is unclear to me what dynamical features it introduces. I think it would require a detailed analysis.
It's not true that normatively learning rates should not decay in volatile tasks (see Eq. 10 in [1])
Also what's normative depends on the assumed model class (eg change-points vs random-walk). In change-point models, rates spike at a changes and decay otherwise.
[1] papers.nips.cc/paper_files/...
Technical remark:
In this study I use Master equations (commonly used in statistical physics) to derive analytical expressions for key observables. This approach could be very useful for studying learning dynamics of RL algorithms without having to run costly simulations.
Conclusions: We need a more robust methodology to estimate the temporal variations in learning rates (I provide a suggestion). Without modelling the temporal dynamics of the learning rates, making claims about bias would be problematic.
Full paper: www.pnas.org/doi/10.1073/...
The culprit? A fundamental model misspecification issue. Optimal learning has decreasing rates (for vanilla bandit tasks); vanilla Q-learning assumes fixed ones. Decreasing rates cause (not always) decreased action switching. Only way to have that with constant rates is through a bias.
18.10.2025 21:21 — 👍 1 🔁 0 💬 1 📌 0Is confirmation bias a real cognitive flaw, or a statistical ghost created by our models? My new PNAS paper shows a startling result: fitting Q-learning models to behavior in bandit tasks detect a bias, even from the behavior of a perfectly rational Bayesian learner.
18.10.2025 21:21 — 👍 11 🔁 1 💬 2 📌 1"Asymmetry is apparent, only if we assume people are Bayesian"
This is not quite accurate. In the paper I show that there is a large class of temporal profiles of the learning rate (that are not Bayes optimal) that might lead to the appearance of bias.