Thinking of all the colleagues and friends I have who are connected to Brown University. What a devastating day for Brown, and for all of us.
14.12.2025 03:19 β π 119 π 13 π¬ 1 π 0Thinking of all the colleagues and friends I have who are connected to Brown University. What a devastating day for Brown, and for all of us.
14.12.2025 03:19 β π 119 π 13 π¬ 1 π 0
Some tasks admit dif algorithms that behave the same on the training data, so a modelβs learned mechanism can look arbitrary unless we know what the task requires (the goals, constraints, and invariances that define a correct solution)
β Other cases like this or other limits of mech interp?
π§΅ (2/2)
π What are the limits of interpretability in ML?
Mech interp often stays at Marrβs algorithmic level but without the computational level (what the task is, what counts as the right solution) the mechanisms we find can look arbitrary. Why does a model learn one algorithm rather than another?
π§΅ (1/2)
Introspection targets our ongoing or recently past mental states. What could it mean for a system that lacks any obvious analogue of a continuous stream of experience to have current or recently past βinternal statesβ to introspect on?
Robert Long makes a similar point in his substack
Anthropic has a great new piece on βSigns of introspection in large language modelsβ π www.anthropic.com/research/int...
π€ Neat evidence that LLMs can report on manipulated activations, with big caveats!
π§ But leaves open: what are the βinternal statesβ an LLM can introspect in the first place?
This is a beautiful paper! The first third helpfully labels a stream of recent work in philosophy of AI as "propositional interpretability". The idea is to use propositional attitudes like belief, desire, and intention, to help explain AI in a way that we can understand. 1/n
29.01.2025 13:24 β π 49 π 11 π¬ 2 π 0
"The AI risk repository, which includes over 700 AI risks grouped by causal factors (e.g. intentionality), and domains (e.g. discrimination), was born out of a desire to understand the overlaps and disconnects in AI safety research"
#AIEthics
techcrunch.com/2024/08/14/m...
Would love to be included!
23.11.2024 20:21 β π 8 π 0 π¬ 0 π 0