Sriram Padmanabhan @sriramp05

On Language Models' Sensitivity to Suspicious Coincidences Humans are sensitive to suspicious coincidences when generalizing inductively over data, as they make assumptions as to how the data was sampled. This results in smaller, more specific hypotheses bein...

For more results (e.g., experiments on models’ parametric knowledge about various hypotheses as well as results on the city game) check out the paper here: arxiv.org/abs/2504.09387

21.04.2025 20:51 — 👍 0 🔁 0 💬 0 📌 0

LMs’ zero-shot behavior shows little to no sensitivity to suspicious coincidences. But the results change when the knowledge of the hypothesis space is activated either implicitly (Chain-of-Thought) or explicitly (Knowledge) - sometimes even consistent with humans (qualitatively)

21.04.2025 20:51 — 👍 0 🔁 0 💬 1 📌 0

We test sensitivity in three environments: zero-shot, Chain-of-Thought, and a “Knowledge” prompt that provides the model with explicit access to the possible hypotheses the input and target could be sampled from.

21.04.2025 20:51 — 👍 0 🔁 0 💬 1 📌 0

We focus on two domains: the number game from Tenenbaum (1999) with human judgments collected by Eric Bigelow and @spiantado.bsky.social, and a world-cities domain (but with no human judgements).

21.04.2025 20:51 — 👍 0 🔁 0 💬 1 📌 0

Given the LM’s yes/no responses, we calculate the F1 scores for members of each hypothesis that fits both the input and target and determine whether the smallest such hypothesis is favored by the model.

21.04.2025 20:51 — 👍 0 🔁 0 💬 1 📌 0

To test model sensitivity to suspicious coincidences, we provide the model with an input that could be sampled from multiple hypotheses (e.g. “16, 8, 2, 64”) and ask it whether a given target value (e.g “32”) is compatible with the input.

21.04.2025 20:51 — 👍 0 🔁 0 💬 1 📌 0

This is known as the suspicious coincidence effect – if you were to convey “odd” numbers then it is highly suspicious that you chose those numbers. Humans show this sensitivity across a wide range of contexts: here, smaller hypotheses are favored over more general ones.

21.04.2025 20:51 — 👍 0 🔁 0 💬 1 📌 0

Humans readily show sensitivity to the way data is generated when reasoning inductively. E.g.,if some program generated “93, 43, 83, 53” – it’s likely producing numbers ending in 3, even though it’s not the only applicable hypothesis (e.g., they’re all odd numbers)

21.04.2025 20:51 — 👍 0 🔁 0 💬 1 📌 0

Are LMs sensitive to suspicious coincidences? Our paper finds that, when given access to knowledge of the hypothesis space, LMs can show sensitivity to such coincidences, displaying parallels with human inductive reasoning. w/@kanishka.bsky.social, @kmahowald.bsky.social, @eunsol.bsky.social

21.04.2025 20:51 — 👍 5 🔁 0 💬 1 📌 3

Sriram Padmanabhan

Latest posts by sriramp05.bsky.social on Bluesky

@sriramp05 is following 4 prominent accounts