claudia shi's Avatar

claudia shi

@claudiashi.bsky.social

machine learning, causal inference, science of llm, ai safety, phd student @bleilab, keen bean https://www.claudiashi.com/

470 Followers  |  69 Following  |  18 Posts  |  Joined: 19.11.2024  |  1.7314

Latest posts by claudiashi.bsky.social on Bluesky

Preview
Hypothesis Testing the Circuit Hypothesis in LLMs Large language models (LLMs) demonstrate surprising capabilities, but we do not understand how they are implemented. One hypothesis suggests that these capabilities are primarily executed by small sub...

Details are in the paper: arxiv.org/abs/2410.13032
We also developed a cool package for circuit testing: github.com/blei-lab/cir...
Find us at the NeurIPS Thursday poster session or at the bestest dim sum restaurant in Vancouver!

10.12.2024 18:36 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Our tests reveal gaps between the idealized version of the circuit representation and what we find in practice. By formalizing desirable properties, we hope to refine the circuit hypothesis, addressing questions such as what is the "optimal" level of granularity

10.12.2024 18:36 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Findings: Synthetic circuits align with all the ideal criteria. Semi-synthetic circuits pass some of the idealized tests. Circuits in the wild pass none of the idealized tests

10.12.2024 18:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0


We apply our tests to six benchmark circuits from the literature: two synthetic circuits, two semi-synthetic circuits (circuits discovered on toy transformer models), and two circuits in the wild (circuits discovered on transformer models such as GPT-2).

10.12.2024 18:36 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We compare the candidate circuit against random circuits drawn from a reference distribution. We vary the reference distribution to change the hardness of the test.

10.12.2024 18:36 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The idealized tests are stringent, so we developed two flexible tests that quantify:

Sufficiency Test: How faithful is faithful enough?
Partial Necessity Test: How much knockdown effect is significant?

10.12.2024 18:36 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0


Independence Test: Removing the circuit renders the model output independent of that of the circuit

Minimality Test: All edges in the circuit are necessary for the task

10.12.2024 18:36 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We translate these properties into three idealized tests:

Equivalence Test: The circuit and the original model have the same chance of outperforming each other

10.12.2024 18:36 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We formalize three criteria of an idealized circuit and develop hypothesis tests for them:
1๏ธโƒฃ Mechanism Preservation: The circuit should preserve the model's behavior
2๏ธโƒฃ Localization: Removing the circuit disables the task
3๏ธโƒฃ Minimality: The circuit contains no redundant parts

10.12.2024 18:36 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

The circuit hypothesis proposes that LLM capabilities emerge from small subnetworks within the model. But how can we actually test this? ๐Ÿค”

joint work with @velezbeltran.bsky.social @maggiemakar.bsky.social @anndvision.bsky.social @bleilab.bsky.social Adria @far.ai Achille and Caro

10.12.2024 18:36 โ€” ๐Ÿ‘ 15    ๐Ÿ” 6    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2

hiiiii

04.12.2024 20:31 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I'd love to be added to the starter pack! I work on causal inference.

04.12.2024 16:28 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Hi Rob, I'd love to added to the starter pack.

04.12.2024 16:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I'd love to be added! i am bayesian adjacent!

04.12.2024 16:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

i'd love to be added!

04.12.2024 16:24 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Hi! I'd love to be added to the starter pack!

04.12.2024 16:17 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Hi! could you also add me to the mech interp list? I do mech interp research.

04.12.2024 16:16 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@datatherapist.bsky.social i'd love to be added to the new one! thank you

04.12.2024 16:13 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@claudiashi is following 20 prominent accounts