For more results (e.g., experiments on modelsβ parametric knowledge about various hypotheses as well as results on the city game) check out the paper here: arxiv.org/abs/2504.09387
21.04.2025 20:51 β π 0 π 0 π¬ 0 π 0@sriramp05.bsky.social
Undergraduate at UT Austin majoring in CS & Math
For more results (e.g., experiments on modelsβ parametric knowledge about various hypotheses as well as results on the city game) check out the paper here: arxiv.org/abs/2504.09387
21.04.2025 20:51 β π 0 π 0 π¬ 0 π 0LMsβ zero-shot behavior shows little to no sensitivity to suspicious coincidences. But the results change when the knowledge of the hypothesis space is activated either implicitly (Chain-of-Thought) or explicitly (Knowledge) - sometimes even consistent with humans (qualitatively)
21.04.2025 20:51 β π 0 π 0 π¬ 1 π 0We test sensitivity in three environments: zero-shot, Chain-of-Thought, and a βKnowledgeβ prompt that provides the model with explicit access to the possible hypotheses the input and target could be sampled from.
21.04.2025 20:51 β π 0 π 0 π¬ 1 π 0We focus on two domains: the number game from Tenenbaum (1999) with human judgments collected by Eric Bigelow and @spiantado.bsky.social, and a world-cities domain (but with no human judgements).
21.04.2025 20:51 β π 0 π 0 π¬ 1 π 0Given the LMβs yes/no responses, we calculate the F1 scores for members of each hypothesis that fits both the input and target and determine whether the smallest such hypothesis is favored by the model.
21.04.2025 20:51 β π 0 π 0 π¬ 1 π 0To test model sensitivity to suspicious coincidences, we provide the model with an input that could be sampled from multiple hypotheses (e.g. β16, 8, 2, 64β) and ask it whether a given target value (e.g β32β) is compatible with the input.
21.04.2025 20:51 β π 0 π 0 π¬ 1 π 0This is known as the suspicious coincidence effect β if you were to convey βoddβ numbers then it is highly suspicious that you chose those numbers. Humans show this sensitivity across a wide range of contexts: here, smaller hypotheses are favored over more general ones.
21.04.2025 20:51 β π 0 π 0 π¬ 1 π 0Humans readily show sensitivity to the way data is generated when reasoning inductively. E.g.,if some program generated β93, 43, 83, 53β β itβs likely producing numbers ending in 3, even though itβs not the only applicable hypothesis (e.g., theyβre all odd numbers)
21.04.2025 20:51 β π 0 π 0 π¬ 1 π 0Are LMs sensitive to suspicious coincidences? Our paper finds that, when given access to knowledge of the hypothesis space, LMs can show sensitivity to such coincidences, displaying parallels with human inductive reasoning. w/@kanishka.bsky.social, @kmahowald.bsky.social, @eunsol.bsky.social
21.04.2025 20:51 β π 5 π 0 π¬ 1 π 3