We will be presenting this work at #ICML2025 and are happy to discuss it further.
ποΈ: Tue 15 Jul 4:30 p.m. PDT
π: East Exhibition Hall A-B #E-1912
Joint 1st author: @ruby-sedgwick.bsky.social.
With: Avinash Kori, Ben Glocker, @mvdw.bsky.social.
π§΅14/14
10.07.2025 18:06 β π 0 π 0 π¬ 0 π 0
Finally, we note the flexibility of our model comes at the cost of more difficult optimisation. However, random restarts and choosing the model with the highest score reliably improve the structure recovery metrics (commonly done in GPs).
π§΅13/14
10.07.2025 18:06 β π 0 π 0 π¬ 1 π 0
We also test our method on semi-synthetic data generated from the Syntren gene regulatory network simulator.
π§΅12/14
10.07.2025 18:06 β π 0 π 0 π¬ 1 π 0
When data are generated from an identifiable model (ANM), our more flexible model performs as well as an ANM restricted Bayesian model (CGP). Both Bayesian models again outperform other non-Bayesian approaches - even those that assume the correct ANM assumption.
π§΅11/14
10.07.2025 18:06 β π 0 π 0 π¬ 1 π 0
With larger number of variables (50), where the discrete search blows up, and with complex data, our approach performs well. SDCD uses the same acyclicity regulariser but uses maximum likelihood with NNs. This shows the advantage of the Bayesian approach.
π§΅10/14
10.07.2025 18:06 β π 1 π 0 π¬ 1 π 0
We first test on data generated from our model itself and where discrete model selection is tractable (3 variables). Here, we show that while the discrete model (DGP-CDE) recovers the true structure reliably, our continuous approximation (CGP-DCE) results in higher error.
π§΅9/14
10.07.2025 18:06 β π 1 π 0 π¬ 1 π 0
We enforce acyclicity in the adjacency by adding an acyclicity constraint to the optimisation. Variational inference trains the rest of the parameters.
The final objective returns the adjacency of the causal structure that maximises the posterior.
π§΅8/14
10.07.2025 18:06 β π 2 π 0 π¬ 1 π 0
Therefore, we can construct an adjacency matrix from the kernel hyperparameters. This amounts to Automatic Relevance Determination: maximising the marginal likelihood uncovers the dependency structure among the variables. However, the learnt adjacency must be acyclic.
π§΅7/14
10.07.2025 18:06 β π 1 π 0 π¬ 1 π 0
Next, we construct a latent variable Gaussian process model that can model non-Gaussian densities with inputs according to a causal graph. To continuously parametrise the space of graphs, we note that the kernel hyperparameters control input dependence.
π§΅6/14
10.07.2025 18:06 β π 1 π 0 π¬ 1 π 0
We first show that the guarantees of Bayesian model selection (BMS) hold in the multivariate case: 1) when the underlying model is identifiable, BMS identifies the true DAG, 2) for more flexible models, graphs stay distinguishable.
π§΅5/14
10.07.2025 18:06 β π 1 π 0 π¬ 1 π 0
However, naive Bayesian model selection scales poorly because DAGs grow exponentially with no. of variables.
We propose a continuous Bayesian model selection approach that scales and allows for using more flexible assumptions.
π§΅4/14
10.07.2025 18:06 β π 1 π 0 π¬ 1 π 0
While current causal discovery impose unrealistic model restrictions to ensure identifiability, Bayesian models relax identifiability but allow for causal and more realistic assumptions, yielding performance gains: arxiv.org/abs/2306.02931
π§΅3/14
10.07.2025 18:06 β π 1 π 0 π¬ 1 π 0
Bayesian models encode soft restrictions in the form of priors. These priors also allow for encoding causal assumptions. Mainly that causal mechanisms do not inform each other. This is achieved by simply ensuring that the prior factorises over the mechanisms.
π§΅2/14
10.07.2025 18:06 β π 1 π 0 π¬ 1 π 0
π’New #ICML2025 paper: "Continuous Bayesian Model Selection for Multivariate Causal Discovery".
We propose a Bayesian causal model that allows for scalable causal discovery without restrictive model assumptions.
Paper: arxiv.org/abs/2411.10154
Code: github.com/Anish144/Con...
π§΅1/14
10.07.2025 18:06 β π 1 π 0 π¬ 1 π 2
A Meta-Learning Approach to Bayesian Causal Discovery
Discovering a unique causal structure is difficult due to both inherent identifiability issues, and the consequences of finite data.
As such, uncertainty over causal structures, such as those...
Excited to be presenting this work at #ICLR2025. Please do reach out if you are interested in a similar space!
ποΈ: Hall 3 + Hall 2B #471
π: Fri 25 Apr, 3 p.m.
π: openreview.net/forum?id=eeJ...
This was a great collaboration w/ @mashtro.bsky.social, James Requeima, @mvdw.bsky.social
19.04.2025 17:39 β π 1 π 0 π¬ 0 π 0
Why did that work? We are approximating the posterior of a causal model (from which data is generated), which may be different to the data generating process. Improving the causal model (more flexible, wider prior), and increasing the capacity of the neural process can help 14/15
19.04.2025 17:39 β π 0 π 0 π¬ 1 π 0
What if we don't know the data distribution? Our approach here is to encode a "wide prior", training on mixtures of all possible models (that we can think of). We show that this approach leads to good performance on datasets whose generation process was unknown at training. 13/15
19.04.2025 17:39 β π 0 π 0 π¬ 1 π 0
Next, we test with higher nodes (20), denser graphs, and more complicated functions. Here, we show that our model outperforms other baselines. Notably, a single model that is trained on all the data (labelled BCNP All Data) does not lose performance on specific datasets. 12/15
19.04.2025 17:39 β π 0 π 0 π¬ 1 π 0
We first show that our model outputs reasonable posterior samples: 2 node, graph with single edge, where the underlying data is not identifiable. Here we can see that the AVICI model, that does not correlate terms of the adjacency matrix, fails to output reasonable samples. 11/15
19.04.2025 17:39 β π 0 π 0 π¬ 1 π 0
We test against two baseline types: 1) Posterior approx. via marginal likelihood (DiBS, BayesDAG). 2) NP-like methods finding single structures, that can be used to obtain posterior samples, but missing key properties of the posterior (AVICI, CSIvA). 10/15
19.04.2025 17:39 β π 0 π 0 π¬ 1 π 0
The loss, targeting the KL divergence, simplifies to maximising the log probability of the true causal graph under our model. The final scheme: A model that efficiently outputs samples of causal structures approximating the true posterior β with just a forward pass! 9/15
19.04.2025 17:39 β π 2 π 0 π¬ 1 π 0
Our decoder uses lower triangular-permutation matrices (A, Q) to construct DAGs. A Gumbel-Sinkhorn distribution is parameterised, from which permutations (Q) can be sampled. The representation is further processed to parameterise the lower triangular matrix (A). 8/15
19.04.2025 17:39 β π 1 π 0 π¬ 1 π 0
We embed each node sample pair, and append a query vector of 0s to the sample axis. Our encoder alternates between attention over samples and nodes to preserve equivariance. We then perform cross attention with the query vector to encode permutation invariance over samples. 7/15
19.04.2025 17:39 β π 1 π 0 π¬ 1 π 0
What does our model look like? We encode key properties of the posterior: 1) Permutation Invariance with respect to the samples, 2) Permutation equivariance with respect to nodes, 3) Correlation between adjacency elements. We do with a transformer encoder-decoder structure. 6/15
19.04.2025 17:39 β π 1 π 0 π¬ 1 π 0
Our training objective reflects this: we minimise the KL between the true posterior and the neural process. The key property is that we only require samples of data and the true causal graph. This data forms the "prior", which can be synthetic or can be from real examples. 5/15
19.04.2025 17:39 β π 0 π 0 π¬ 1 π 0
So how do we solve this? We bypass these integrals by using the neural process paradigm: an NN that learns to directly map data to the target posterior. We sample data from the causal model of interest & train a network to reconstruct the underlying causal structure. 4/15
19.04.2025 17:39 β π 0 π 0 π¬ 1 π 0
Bayes infers a distribution over plausible causal structures instead of a single structure. However, it requires solving a complicated integral, or equivalently finding a posterior over functions. The space of DAGs also grows super-exponentially with the number nodes. 3/15
19.04.2025 17:39 β π 0 π 0 π¬ 1 π 0
Causal discovery from observational data is a challenging task. Identifiability is only guaranteed under strong assumptions about model classes, which are often violated in practice. This issue is made worse by finite data. 2/15
19.04.2025 17:39 β π 0 π 0 π¬ 1 π 0
Understanding causes is key to science. Finite observational data alone isn't enough. While Bayes offers a framework to deal with this, the calculations are often intractable. We introduce a method to accurately approximate the posterior over causal structures.
#ICLR2025 π§΅1/15
19.04.2025 17:39 β π 15 π 2 π¬ 1 π 1
Research Scientist at Xyme interested in Bayesian machine learning for biotechnology applications & causality. Previously a postdoc at Imperial College London.
(Tab)PFNs, TrivialAugment etc.
Frontier models for all molecules of life.
New translation of Suetonius out in February! Dinosaur enthusiast. βA leading English cricketerβ - The Times. Podcast: theresthistory.bsky.social
History + bad accents, with Tom Holland and Dominic Sandbrook.
From Goalhanger Podcasts
Professor of AI in the Sciences at University of Potsdam // causalinferencelab.com
DKO fellow (Assistant Prof equiv.) at University of Manchester. Causality Research Unit Lead at Valence Labs / Recursion Pharma. Prev: postdoc at Mila with Prof Yoshua Bengio, PhD at UBC. South African πΏπ¦
We study the mathematical principles of learning, perception & action in brains & machines. Funded by the Gatsby Charitable Foundation. Based at UCL. www.ucl.ac.uk/gatsby
Driven by industry progress, inspired by provocative leadership, plus don't mind a good pair of shoes or a great @PennStateFball scoreboard either.
The PolymathicAI Collaboration is building generalist foundation models for science.
https://polymathic-ai.org/
AIxBio Research Scientist π©βπ¬ | PhD Computational Biology π©βπ» | BSc Biomedicine π§« | "Singlecellologist" π¦ into biologically meaningful representation learning 𧬠| Decoding life in London π¬π§
Associate Prof | AI for drug discovery | Eindhoven University of Technology | Previously ETH Zurich & UniMiB | she/her π³οΈβπ
ML + Physics + Health at ο£Ώ. Exploring the interaction between scientific and ML models.
Ph.D. student on generative models and domain adaptation for Earth observation π°
Previously intern @SonyCSL, @Ircam, @Inria
π Personal website: https://lebellig.github.io/
Research scientist at Anthropic. Prev. Google Brain/DeepMind, founding team OpenAI. Computer scientist; inventor of the VAE, Adam optimizer, and other methods. ML PhD. Website: dpkingma.com
AI @ OpenAI, Tesla, Stanford
Professor, Santa Fe Institute. Research on AI, cognitive science, and complex systems.
Website: https://melaniemitchell.me
Substack: https://aiguide.substack.com/