I'm in Lausanne for CLeaR! www.cclear.cc/2025
Looking forward to an excellent program, seeing old friends, and making some new ones - feel free to message me if you'll be here.
@chandlersquires.bsky.social
CMU postdoc, previously MIT PhD. Causality, pragmatism, representation learning, and AI for biology / science more broadly. Proud rat dad.
I'm in Lausanne for CLeaR! www.cclear.cc/2025
Looking forward to an excellent program, seeing old friends, and making some new ones - feel free to message me if you'll be here.
If I was a high school senior deciding between undergrad programs, or an undergraduate senior deciding between graduate programs, the recent conduct of places like Harvard would weigh heavily in their favor. Long term gain for short term pain.
16.04.2025 03:52 β π 1 π 0 π¬ 0 π 0A massive, self-inflicted wound on American higher education.
21.02.2025 23:19 β π 582 π 153 π¬ 16 π 4Tons of interesting questions related to these topics, and tons of technical perspectives to explore. I'm keen to see where this line of thinking might lead, please link to any references that might be related π
02.02.2025 22:00 β π 0 π 0 π¬ 0 π 0Ultimately, I think the distinction between "interpolation" and "extrapolation" needs to be made in terms of these levels. In some sense, it seems like being able to "extrapolate at level L" requires some machinery for moving up to level L+1 (or higher), interpolating, and moving back down.
02.02.2025 22:00 β π 2 π 0 π¬ 1 π 0This hierarchical structure appears over and over again.
In Bayesian statistics, there are priors, hyperpriors, hyperhyperpriors, and so on. In topology, there are paths, homotopies (paths between paths), and so on. In geometric algebra, we have vectors, bivectors, trivectors, and so on...
In the LLM example, the base level might be concepts like "pizza" and "Beowulf", the next higher level might be relationships between these concepts, then we can think about relationships between these relationships, and so on...
02.02.2025 22:00 β π 1 π 0 π¬ 1 π 0Ultimately, we need some inductive biases to favor certain parameterizations over others. We can't escape the need for "domain knowledge", we just outsource it to a higher level and might call it something else.
02.02.2025 22:00 β π 0 π 0 π¬ 1 π 0Now, parameterization-sensitivity is not inherently bad, it's just important to know what we're committing to - to identify our implicit assumptions and make them explicit.
If we know the "right" parameterization, then we should use it, but what if we don't, and how do we even define "right"?
This instability of convexity-based definitions has been discussed a lot in the philosophical and cognitive science literature on natural kinds and categorization.
I'm currently reading "Conceptual Spaces: The Geometry of Thought" which covers this in great detail, would recommend.
However, beyond 1D, our "nice" transformations T don't preserve convexity. So this distinction between interpolation and extrapolation is not parameterization-invariant.
Thus, a convexity-based distinction presupposes that we already (to some extent) know the "right" parameterization.
How does this extend to higher dimensions? A natural generalization of the interval is the convex hull of x's that we've already seen, call this set S. Then we might think that predicting for a new x* in S is "interpolating", and predicting x* outside of S is "extrapolating".
02.02.2025 22:00 β π 0 π 0 π¬ 1 π 0What if we reparameterize, letting x'=T(x) for some
"nice" (continuous and invertible) transformation T? In 1D, such a transformation T must be monotonic, so intervals map to intervals, and the distinction between interpolation/extrapolation remains the same.
On the mathematical side, I think that 1D intuitions lead us astray.
For y=f(x), with x and y both scalars, we typically think of "interpolation" as predicting f(x) within the interval of x's that we've already seen, and "extrapolation" as predicting f(x) outside of that interval.
How does an LLM "know" how to combine two concepts which it hasnβt seen combined before? Well, maybe itβs seen similar pairs of concepts combined.
Then, the LLM is "extrapolating" at the base level of these concepts, but "interpolating" at the level of relationships between concepts.
LLMs highlight the difficulty of making these distinctions. If I prompt an LLM to give me a recipe for deep dish pizza in the style of Beowulf, then is it extrapolating (since thatβs not in its training set) or interpolating (since pizza recipes and Beowulf are both in its training set)?
02.02.2025 22:00 β π 0 π 0 π¬ 1 π 0I think βextrapolationβ is in that dangerous category of words that make us feel like we know what weβre talking about, even when we arenβt.
The distinction between interpolation and extrapolation isnβt a given, itβs heavily theory-laden.
Some thoughts on interpolation vs. extrapolation:
I have a soft spot for the word βextrapolationβ in the context of machine learning, using it as a broad term to capture ideas like compositional generalization and various forms of distributional robustness.
But it can be a major linguistic crutch.
I think a ton of others would have interesting thoughts on this - @jhartford.bsky.social, @moberst.bsky.social, @smaglia.bsky.social, to name a few
29.01.2025 00:00 β π 1 π 0 π¬ 0 π 0Not sure whether this all fits better into the βvariable-centricβ or βmechanism-centricβ perspective. It reminds me of a lot of other conceptual dualities, e.g. the literal duality between a vector space and its dual, or the categorical distinction between objects and morphisms.
29.01.2025 00:00 β π 0 π 0 π¬ 1 π 0We usually talk about interventions in black and white - a mechanism was changed, or it wasnβt. I think the grey area (how much are they changed) is woefully unexplored, and is going to be key to many applications.
29.01.2025 00:00 β π 0 π 0 π¬ 1 π 0That metadata can be thought of as a variable/vector, e.g. a molecular embedding when the interventions are drugs. Then we can encode priors, like similar drugs should have similar intervention targets.
29.01.2025 00:00 β π 0 π 0 π¬ 1 π 0In my opinion, this should be one of the first things we teach. It also naturally suggests a lot of extensions which are perhaps less obvious from other perspectives. For example, in the unknown-target interventions setting, we might have some metadata about each intervention.
29.01.2025 00:00 β π 1 π 0 π¬ 1 π 0You can find this kind of trick in a lot of works: Joint Causal Inference, the selection diagrams of Bareinboim and his collaborators, the decision-theoretic framework of Dawid and his collaborators, my work on structure learning and experimental design. Super useful!
29.01.2025 00:00 β π 1 π 0 π¬ 1 π 0Thatβs really nice, because it lets you re-use existing causal discovery algorithms with minor changes (e.g., adding constraints that the intervention variables come before the βsystemβ variables and switching conditional independence tests to conditional βinvarianceβ tests).
29.01.2025 00:00 β π 0 π 0 π¬ 1 π 0That perspective is really useful for structure learning from interventions with unknown targets. Then learning the intervention targets just becomes learning the children of the associated indicator variable.
29.01.2025 00:00 β π 0 π 0 π¬ 1 π 0In particular, an βinterventionβ is an operation which takes a model and returns a related model, a la Jonas Petersβs textbook. Then hard/perfect and do interventions are special cases. BUT these two models can always be embedded into a bigger model with an indicator variable for the intervention.
29.01.2025 00:00 β π 0 π 0 π¬ 1 π 0Iβve pretty much completely switched over to defining a causal Bayesian network as a collection of βcausal mechanismsβ (conditional distributions). Then itβs very natural to define a general (i.e., soft or imperfect) intervention as a change to one or more of these mechanisms.
29.01.2025 00:00 β π 0 π 0 π¬ 1 π 0I use the term βmechanismβ instead of βparameterβ to emphasize that the conditional distributions/structural functions can be nonparametric, but the story is basically the same.
29.01.2025 00:00 β π 1 π 0 π¬ 1 π 0