Nora Belrose's Avatar

Nora Belrose

@norabelrose.bsky.social

AI, philosophy, spirituality Head of interpretability research at EleutherAI, but posts are my own views, not Eleuther’s.

1,013 Followers  |  15 Following  |  38 Posts  |  Joined: 20.11.2024
Posts Following

Posts by Nora Belrose (@norabelrose.bsky.social)

why don't more people become zoroastrian?


it's where judaism and christianity got the idea of ethical monotheism, afterlife, and final judgment but without any of their baggage


(no eternal hell, no historically questionable dogmas, etc.)

24.10.2025 03:32 — 👍 3    🔁 0    💬 0    📌 0

If we care only about appearances, outcomes, and results then AI will outcompete humans at everything


If we care about the process used to create things then humans can still have jobs and meaningful lives


The idea that ends can be detached from means is the root of many evils

11.10.2025 01:10 — 👍 4    🔁 0    💬 0    📌 0

Strongly agree with this bill https://www.usatoday.com/story/news/politics/2025/09/29/ohio-state-legislator-ban-people-marrying-ai/86427987007/

30.09.2025 01:35 — 👍 1    🔁 0    💬 0    📌 0

if the laws of physics are fundamentally probabilistic, as they seem to be, that makes it easier to see how they can smoothly change over time

13.06.2025 07:48 — 👍 2    🔁 0    💬 0    📌 0

data attribution is a special case of data causality:


estimating the causal effect of either learning or unlearning one datapoint (or set of datapoints) on the neural network's behavior on other datapoints

12.06.2025 04:02 — 👍 3    🔁 0    💬 0    📌 0

Neural networks don't have organs.


They aren't made of fixed mechanisms.


They have flows of information and intensities of neural activity. They can't be organized into a set of parts with fixed functions.


In the words of Gilles Deleuze, they're bodies without organs (BwO).

27.03.2025 19:11 — 👍 6    🔁 0    💬 1    📌 0
Preview
Mixture-of-Depths: Dynamically allocating compute in... Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to...

This seems like a cool way to use an adaptive amount of compute per token. I speculate that models like these will have more faithful CoT since they don't get to do "extra" reasoning on easy tokens https://arxiv.org/abs/2404.02258

13.03.2025 23:55 — 👍 4    🔁 0    💬 1    📌 0

Also chapter 10 where he discards the notion of the Soul but maintains the distinction between mind and brain

24.02.2025 18:35 — 👍 0    🔁 0    💬 0    📌 0

William James did a lot of good philosophy of mind in chapters 1, 5, and 6 ofThe Principles of Psychology, we've barely made any progress in 135 years 😂

24.02.2025 18:35 — 👍 3    🔁 0    💬 0    📌 0
Post image

I love this meme

22.02.2025 05:33 — 👍 6    🔁 0    💬 0    📌 0

might interest @nabla_theta

07.02.2025 00:32 — 👍 2    🔁 0    💬 0    📌 0

Pro tip: if you want to implement TopK SAEs efficiently, and don't want to deal with Triton, just use this function for the decoder, it's much faster than the naive dense matmul implementation
https://pytorch.org/docs/stable/generated/torch.nn.functional.embedding_bag.html

06.02.2025 19:32 — 👍 8    🔁 0    💬 0    📌 0
GitHub - EleutherAI/basin-volume: Precisely estimating the volume of basins in neural net parameter space corresponding to interpretable behaviors Precisely estimating the volume of basins in neural net parameter space corresponding to interpretable behaviors - EleutherAI/basin-volume

And here's the code we used to generate the results: github.com/EleutherAI/b...

03.02.2025 22:01 — 👍 3    🔁 0    💬 0    📌 0
Preview
Estimating the Probability of Sampling a Trained Neural Network at Random We present an algorithm for estimating the probability mass, under a Gaussian or uniform prior, of a region in neural network parameter space corresponding to a particular behavior, such as achieving ...

Here's the paper link: arxiv.org/abs/2501.18812

03.02.2025 22:01 — 👍 1    🔁 0    💬 1    📌 0

Second, we speculate that complexity measures like this be useful for detecting undesired "extra reasoning" in deep nets. We want networks to be aligned with our values instinctively, without scheming about whether this would be consistent with some ulterior motive arxiv.org/abs/2311.08379

03.02.2025 22:01 — 👍 1    🔁 0    💬 1    📌 0

We're interested in this line of work for two reasons:

First, it sheds light on how deep learning works. The "volume hypothesis" says DL is similar to randomly sampling a network from weight space that gets low training loss. But this can't be tested if we can't measure volume.

03.02.2025 22:01 — 👍 4    🔁 0    💬 1    📌 0
Post image

We find that the probability of sampling a network at random— or local volume for short— decreases exponentially as the network is trained.

And networks which memorize their training data without generalizing have lower local volume— higher complexity— than generalizing ones.

03.02.2025 22:01 — 👍 2    🔁 0    💬 1    📌 0
Post image

But the total volume can be strongly influenced by a small number of outlier directions, which are hard to sample in high dimension— think of a big, flat pancake.

Importance sampling using gradient info helps address this issue by making us more likely to sample outliers.

03.02.2025 22:01 — 👍 2    🔁 0    💬 1    📌 0
Post image

It works by exploring random directions in weight space, starting from an "anchor" network.

The distance from the anchor to the edge of the region, along the random direction, gives us an estimate of how big (or how probable) the region is as a whole.

03.02.2025 22:01 — 👍 2    🔁 0    💬 1    📌 0

My colleague Adam Scherlis and I developed a method for estimating the probability of sampling a neural network in a behaviorally-defined region from a Gaussian or uniform prior.

You can think of this as a measure of complexity: less probable, means more complex.

03.02.2025 22:01 — 👍 3    🔁 0    💬 1    📌 0
Post image

What are the chances you'd get a fully functional language model by randomly guessing the weights?

We crunched the numbers and here's the answer:

03.02.2025 22:01 — 👍 12    🔁 0    💬 2    📌 2
Post image

we have seven (!) papers lined up for release next week


you know you're on a roll when arxiv throttles you

02.02.2025 03:35 — 👍 4    🔁 0    💬 0    📌 0

deepseek now largely replacing chatgpt for me

24.01.2025 01:33 — 👍 6    🔁 0    💬 0    📌 0

Evolutionary biology can learn things from machine learning.


Natural selection alone doesn't explain "train-test" or "sim-to-real" generalization, which clearly happens.


At every level of organization, life can zero-shot adapt to novel situations. https://www.youtube.com/watch?v=jJ9O5H2AlWg

29.12.2024 22:29 — 👍 9    🔁 0    💬 2    📌 1

Truth is relative, when it comes to the physical state of the universe.


But we should accept the existence of perspective-neutral facts about how perspectives relate to one another, to avoid vicious skeptical paradoxes. https://arxiv.org/abs/2410.13819

28.12.2024 21:56 — 👍 6    🔁 0    💬 0    📌 0
Preview
There's Plenty of Room Right Here: Biological Systems as Evolved, Overloaded, Multi-scale Machines The applicability of computational models to the biological world is an active topic of debate. We argue that a useful path forward results from abandoning hard boundaries between categories and adopt...

Neural networks are polycomputers in
@drmichaellevin.bsky.social's sense.

Depending on your perspective, you can interpret them as performing many different computations on different types of features. No perspective is uniquely correct. arxiv.org/abs/2212.10675

28.12.2024 19:38 — 👍 10    🔁 0    💬 0    📌 0

If OpenAI's new o3 model is "successfully aligned," then it could probably be trusted to supervise more powerful models, allowing us to bootstrap to benevolent superintelligence.

20.12.2024 21:20 — 👍 2    🔁 1    💬 0    📌 0
Preview
My Week Without Cosmic Hope (Photo by Tom Pumford on Unsplash)

Interesting to see @philipgoff.bsky.social go back and forth on the fine-tuning argument.

I think the multiverse definitely can't explain fine-tuning, but it's also unclear we need an explanation at all. And God may be a more "complex" hypothesis than the physical constants themselves.

20.12.2024 20:58 — 👍 1    🔁 0    💬 1    📌 0
Preview
GitHub - EleutherAI/training-jacobian Contribute to EleutherAI/training-jacobian development by creating an account on GitHub.

This is the first in a series of upcoming papers on neural network training dynamics and loss landscape geometry. Please check out the interp-across-time channel in the
@eleutherai.bsky.social Discord if you'd like to get more involved.

Code: github.com/EleutherAI/t...

11.12.2024 20:32 — 👍 6    🔁 0    💬 1    📌 0
Post image

Unfortunately, computing the entire training Jacobian and performing SVD on it is computationally intractable for all but the smallest networks.

We focused on a tiny 5K parameter MLP for most experiments, but we did find a similar SV spectrum in a 62K param image classifier.

11.12.2024 20:32 — 👍 3    🔁 0    💬 1    📌 0