Chandler Squires's Avatar

Chandler Squires

@chandlersquires.bsky.social

CMU postdoc, previously MIT PhD. Causality, pragmatism, representation learning, and AI for biology / science more broadly. Proud rat dad.

139 Followers  |  79 Following  |  46 Posts  |  Joined: 13.11.2024  |  1.8458

Latest posts by chandlersquires.bsky.social on Bluesky

CLeaR

I'm in Lausanne for CLeaR! www.cclear.cc/2025

Looking forward to an excellent program, seeing old friends, and making some new ones - feel free to message me if you'll be here.

05.05.2025 09:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If I was a high school senior deciding between undergrad programs, or an undergraduate senior deciding between graduate programs, the recent conduct of places like Harvard would weigh heavily in their favor. Long term gain for short term pain.

16.04.2025 03:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A massive, self-inflicted wound on American higher education.

21.02.2025 23:19 β€” πŸ‘ 582    πŸ” 153    πŸ’¬ 16    πŸ“Œ 4

Tons of interesting questions related to these topics, and tons of technical perspectives to explore. I'm keen to see where this line of thinking might lead, please link to any references that might be related πŸ˜€

02.02.2025 22:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Ultimately, I think the distinction between "interpolation" and "extrapolation" needs to be made in terms of these levels. In some sense, it seems like being able to "extrapolate at level L" requires some machinery for moving up to level L+1 (or higher), interpolating, and moving back down.

02.02.2025 22:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This hierarchical structure appears over and over again.

In Bayesian statistics, there are priors, hyperpriors, hyperhyperpriors, and so on. In topology, there are paths, homotopies (paths between paths), and so on. In geometric algebra, we have vectors, bivectors, trivectors, and so on...

02.02.2025 22:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In the LLM example, the base level might be concepts like "pizza" and "Beowulf", the next higher level might be relationships between these concepts, then we can think about relationships between these relationships, and so on...

02.02.2025 22:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Ultimately, we need some inductive biases to favor certain parameterizations over others. We can't escape the need for "domain knowledge", we just outsource it to a higher level and might call it something else.

02.02.2025 22:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Now, parameterization-sensitivity is not inherently bad, it's just important to know what we're committing to - to identify our implicit assumptions and make them explicit.

If we know the "right" parameterization, then we should use it, but what if we don't, and how do we even define "right"?

02.02.2025 22:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This instability of convexity-based definitions has been discussed a lot in the philosophical and cognitive science literature on natural kinds and categorization.

I'm currently reading "Conceptual Spaces: The Geometry of Thought" which covers this in great detail, would recommend.

02.02.2025 22:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

However, beyond 1D, our "nice" transformations T don't preserve convexity. So this distinction between interpolation and extrapolation is not parameterization-invariant.

Thus, a convexity-based distinction presupposes that we already (to some extent) know the "right" parameterization.

02.02.2025 22:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

How does this extend to higher dimensions? A natural generalization of the interval is the convex hull of x's that we've already seen, call this set S. Then we might think that predicting for a new x* in S is "interpolating", and predicting x* outside of S is "extrapolating".

02.02.2025 22:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

What if we reparameterize, letting x'=T(x) for some
"nice" (continuous and invertible) transformation T? In 1D, such a transformation T must be monotonic, so intervals map to intervals, and the distinction between interpolation/extrapolation remains the same.

02.02.2025 22:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

On the mathematical side, I think that 1D intuitions lead us astray.

For y=f(x), with x and y both scalars, we typically think of "interpolation" as predicting f(x) within the interval of x's that we've already seen, and "extrapolation" as predicting f(x) outside of that interval.

02.02.2025 22:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

How does an LLM "know" how to combine two concepts which it hasn’t seen combined before? Well, maybe it’s seen similar pairs of concepts combined.

Then, the LLM is "extrapolating" at the base level of these concepts, but "interpolating" at the level of relationships between concepts.

02.02.2025 22:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

LLMs highlight the difficulty of making these distinctions. If I prompt an LLM to give me a recipe for deep dish pizza in the style of Beowulf, then is it extrapolating (since that’s not in its training set) or interpolating (since pizza recipes and Beowulf are both in its training set)?

02.02.2025 22:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I think β€œextrapolation” is in that dangerous category of words that make us feel like we know what we’re talking about, even when we aren’t.

The distinction between interpolation and extrapolation isn’t a given, it’s heavily theory-laden.

02.02.2025 22:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Some thoughts on interpolation vs. extrapolation:

I have a soft spot for the word β€œextrapolation” in the context of machine learning, using it as a broad term to capture ideas like compositional generalization and various forms of distributional robustness.

But it can be a major linguistic crutch.

02.02.2025 22:00 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Opinion | Don’t Believe Him Look closely at the first two weeks of Donald Trump’s second term and you’ll see something very different than what he wants you to see.

www.nytimes.com/2025/02/02/o...

02.02.2025 12:17 β€” πŸ‘ 3    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

I think a ton of others would have interesting thoughts on this - @jhartford.bsky.social, @moberst.bsky.social, @smaglia.bsky.social, to name a few

29.01.2025 00:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Not sure whether this all fits better into the β€œvariable-centric” or β€œmechanism-centric” perspective. It reminds me of a lot of other conceptual dualities, e.g. the literal duality between a vector space and its dual, or the categorical distinction between objects and morphisms.

29.01.2025 00:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We usually talk about interventions in black and white - a mechanism was changed, or it wasn’t. I think the grey area (how much are they changed) is woefully unexplored, and is going to be key to many applications.

29.01.2025 00:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

That metadata can be thought of as a variable/vector, e.g. a molecular embedding when the interventions are drugs. Then we can encode priors, like similar drugs should have similar intervention targets.

29.01.2025 00:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In my opinion, this should be one of the first things we teach. It also naturally suggests a lot of extensions which are perhaps less obvious from other perspectives. For example, in the unknown-target interventions setting, we might have some metadata about each intervention.

29.01.2025 00:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

You can find this kind of trick in a lot of works: Joint Causal Inference, the selection diagrams of Bareinboim and his collaborators, the decision-theoretic framework of Dawid and his collaborators, my work on structure learning and experimental design. Super useful!

29.01.2025 00:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

That’s really nice, because it lets you re-use existing causal discovery algorithms with minor changes (e.g., adding constraints that the intervention variables come before the β€œsystem” variables and switching conditional independence tests to conditional β€œinvariance” tests).

29.01.2025 00:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

That perspective is really useful for structure learning from interventions with unknown targets. Then learning the intervention targets just becomes learning the children of the associated indicator variable.

29.01.2025 00:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In particular, an β€œintervention” is an operation which takes a model and returns a related model, a la Jonas Peters’s textbook. Then hard/perfect and do interventions are special cases. BUT these two models can always be embedded into a bigger model with an indicator variable for the intervention.

29.01.2025 00:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I’ve pretty much completely switched over to defining a causal Bayesian network as a collection of β€œcausal mechanisms” (conditional distributions). Then it’s very natural to define a general (i.e., soft or imperfect) intervention as a change to one or more of these mechanisms.

29.01.2025 00:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I use the term β€œmechanism” instead of β€œparameter” to emphasize that the conditional distributions/structural functions can be nonparametric, but the story is basically the same.

29.01.2025 00:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@chandlersquires is following 20 prominent accounts