Laurent Jacob @laurentjacob

#TalentCNRS 🥉| Flora Jay, entre génomes synthétiques et récits évolutifs, reçoit la médaille de bronze du CNRS.
➡️ www.ins2i.cnrs.fr/fr/cnrsinfo/...
🤝 @lisnlab.bsky.social @cnrs-paris-saclay.bsky.social

21.07.2025 12:01 — 👍 6 🔁 4 💬 0 📌 0

Preprint alert! 🦌
Our new abundance index, REINDEER2, is out!
It's cheap to build and update, offers tunable abundance precision at kmer level, and delivers very high query throughput.

Short thread!

www.biorxiv.org/content/10.1...

github.com/Yohan-Hernan...

19.06.2025 09:12 — 👍 22 🔁 13 💬 1 📌 2

Registration is now open!

The 580€ include housing and all meals.

We will close on October 17th or when reaching 80 participants.

18.06.2025 07:22 — 👍 3 🔁 4 💬 0 📌 0

Home - ProbGen 2026 Your Site Description

The 2026 Probabilistic Modeling in Genomics (ProbGen) meeting will be held at UC Berkeley, March 25-28, 2026. We have an amazing list of keynote speakers and session chairs:
probgen2026.github.io

Please help spread the news.

06.06.2025 17:52 — 👍 63 🔁 35 💬 2 📌 0

Merci à @cnrs-rhoneauvergne.bsky.social et @astropierre.com pour cette interview sur mes travaux en IA pour la génomique évolutive!

02.06.2025 10:53 — 👍 15 🔁 3 💬 0 📌 1

There is a nice example in @stephaneguindon.bsky.social's Ph.D thesis p.55

theses.hal.science/tel-00843343...

03.04.2025 12:29 — 👍 0 🔁 0 💬 0 📌 0

The design matrix of the regression should be nPairs x nBranches, and have a 1 at coordinates (i,j) such that branch j belongs to the path defined by pair i in the tree, 0 otherwise.

03.04.2025 12:26 — 👍 0 🔁 0 💬 1 📌 0

I think one way to do this is the least squares method, which gives you the set of branch lengths on your given topology such that the sum of squared differences between your given distances and the distances on the tree are minimal.

03.04.2025 12:23 — 👍 0 🔁 0 💬 1 📌 0

Phyloformer is finally published in MBE! 🎉

academic.oup.com/mbe/advance-...

The thread below provides a summary of our neural network for likelihood-free phylogenetic reconstruction.

12.03.2025 11:49 — 👍 15 🔁 11 💬 0 📌 1

People having breakfast in front of the Alps in the Centre Paul Langevin.

Come hear about the latest advances in the field and discuss your own work at Centre Paul Langevin in beautiful Aussois.

24.02.2025 08:58 — 👍 1 🔁 0 💬 0 📌 0

A headshot of Dr Burak Yelmen.

Burak Yelmen from the University of Tartu will give a keynote presentation on "A perspective on generative neural networks in genomics with applications in synthetic data generation".

24.02.2025 08:58 — 👍 0 🔁 0 💬 1 📌 0

A headshot of Dr Claudia Solis-Lemus.

Claudia Solís-Lemus from the University of Wisconsin-Madison will give a keynote presentation on "The good, the bad and the ugly of deep learning in phylogenetic inference".

24.02.2025 08:58 — 👍 0 🔁 0 💬 1 📌 0

A headshot of Dr Anne-Florence Bitbol.

Anne-Florence Bitbol from EPFL will give a keynote presentation on "Coevolution-aware language models".

24.02.2025 08:58 — 👍 0 🔁 0 💬 1 📌 0

A legendary being holds a phylogenetic tree in the palm of their hand, with snowy mountains in the background.

The next LEGEND conference on machine learning for evolutionary genomics will be in Aussois (French Alps) between December 8th and 12th.

Mark your calendars and make sure your best work is ready next September when the call for abstracts opens 🙂

legend2025.sciencesconf.org

24.02.2025 08:58 — 👍 8 🔁 4 💬 1 📌 1

MUSET: Set of utilities for constructing abundance unitig matrices from sequencing data AbstractSummary. MUSET is a novel set of utilities designed to efficiently construct abundance unitig matrices from sequencing data. Unitig matrices extend

🧬 Excited to share our latest work, MUSET 🌭, a new tool for creating abundance unitig matrices from sequencing data. It was published yesterday in Oxford Bioinformatics if you want to have a look👀 :

academic.oup.com/bioinformati...

Let's break it down:

04.02.2025 14:46 — 👍 17 🔁 13 💬 1 📌 1

My book is (at last) out, just in time for Christmas!
A blog post to celebrate and present it: francisbach.com/my-book-is-o...

21.12.2024 15:23 — 👍 143 🔁 35 💬 2 📌 3

Ok, I tried to create my own list of people working on developing statistical or machine learning models applied to omics data. I am sure I missed a lot of cool people. If you'd like to be added, let me know. #Stats #ML #Omics
go.bsky.app/73rcuJn

24.11.2024 07:50 — 👍 95 🔁 36 💬 38 📌 4

Hi Raphael, thanks for putting this together, I'll be happy to be in the list if you think it makes sense :)

26.11.2024 08:05 — 👍 0 🔁 0 💬 0 📌 0

A sketch summarizing the entire Phyloformer process.

All this work was done by Luca Nesterenko and
@lblassel.bsky.social , assisted by P. Veber, Bastien Boussau
and myself.

The code and data are available at github.com/lucanest/Phy...

Please share if you find this interesting, and we welcome your feedback :)

24.06.2024 08:35 — 👍 1 🔁 1 💬 0 📌 0

A plot comparing the speed of all methods.

In all these experiments, and regardless of model complexity, Phyloformer run on a GPU was the fastest method.

About two orders of magnitude faster than IQtree, and even twice faster than FastME.

24.06.2024 08:33 — 👍 1 🔁 0 💬 1 📌 0

A plot comparing the error of different methods under a more complex probabilistic model of sequence evolution. Phyloformer outperforms all other methods under all metrics.

We then trained Phyloformer under a more realistic model, accounting for co-evolution.

It outperformed all other methods, including IQTree/FastTree, on all metrics.

24.06.2024 08:32 — 👍 2 🔁 0 💬 1 📌 0

A plot stratifying the error into two terms: we perform very well for estimating evolutionary distances, less well for the topology.

More precisely, Phyloformer was very good at predicting distances, and on the Kuhner-Felsenstein metric accounting for both topology and branch lengths.

Looking at the topology only (Robinson-Foulds metric), it performed less well than IQTree/FastTree, but better than FastME.

24.06.2024 08:32 — 👍 1 🔁 0 💬 1 📌 0

A plot comparing the error made by different methods of phylogenetic inference. The distance method FastME underperforms, our method is on par with likelihood methods.

We first trained Phyloformer to perform inference under LG, a common model under which likelihood computation is possible.

It performed much better than FastME (distance method), on par with maximum likelihood approaches (IQTree, FastTree).

24.06.2024 08:31 — 👍 1 🔁 0 💬 1 📌 0

A visual justification of permutation invariance: two sequence alignments that are identical up to a permutation must lead to the same phylogeny.

Phyloformer uses self-attention to progressively share information among and between sequences.

This choice makes our function invariant to the order of the input sequences (any order yields the same output phylogeny).

24.06.2024 08:30 — 👍 1 🔁 0 💬 1 📌 0

The inference process of Phyloformer. We use the train network to estimate evolutionary distances from related sequences, and pass these estimates to FastME to build a phylogeny.

Once trained, Phyloformer provides estimates of all evolutionary distances given the sequences.

But each of these distance estimates is informed by the entire set of sequence, not just the corresponding pair!

We then pass them to FastME, a distance method, to obtain a tree.

24.06.2024 08:29 — 👍 1 🔁 0 💬 1 📌 0

The process for training Phyloformer. We sample trees and sequences evolved along these trees from the model under which we want to do inference. We use these examples to train a network that predicts the parameters (evolutionary distances, equivalent to the tree) from an observation (aligned related sequences).

Phyloformer is a learnable function. Its input is a set of sequences, its output is their phylogeny, represented by evolutionary distances between all pairs of sequences.

We optimize this function on a large number of (phylogeny, sequences) sampled from the probabilistic model.

24.06.2024 08:28 — 👍 1 🔁 0 💬 1 📌 0

A visual justification of simulation-based and likelihood-free inference: under some probabilistic models, computing likelihoods is hard but sampling data is easy.

This is where likelihood-free/simulation-based inference comes into play.

Sampling trees and sequences under a probabilistic model is possible under much more complex models, for which likelihood computations would be prohibitive.

It's an alternative way to access the model.

24.06.2024 08:27 — 👍 1 🔁 0 💬 1 📌 0

A visual summary of maximum likelihood approaches for phylogenetic inference. We explore the space of phylogenetic trees to find the one making a given set of related sequences as likely as possible under a chosen probabilistic model of sequence evolution.

Maximum likelihood approaches on the other hand search for the most likely tree jointly over all sequences.

This makes them accurate but slow. It also restricts these approaches to simplistic models under which likelihood computations are fast enough.

24.06.2024 08:26 — 👍 1 🔁 0 💬 1 📌 0

A visual summary of so-called distance methods for phylogenetic inference. We start from an estimate of evolutionary distances between all pairs of sequences (sums of branch lengths between leaves in the true tree) and build a tree by hierarchical clustering.

Knowing the evolutionary distances (sum of branch lengths) between all pairs of sequences is enough to recover the tree, by hierarchical clustering.

Distance methods rely on this idea, with estimates from pairs of sequences taken separately. This makes them fast but inaccurate.

24.06.2024 08:25 — 👍 1 🔁 0 💬 1 📌 0

A diagram of phylogenetic inference: we build a tree summarizing how a given set of related sequences evolved from a common ancestor.

Phylogenetic trees describe how related sequences (at the leaves) evolved from a common ancestor. Internal nodes are successive ancestral sequences.

In probabilistic models, branch lengths represent an expected number of substitutions between the sequences at the two ends.

24.06.2024 08:25 — 👍 1 🔁 0 💬 1 📌 0

Laurent Jacob

Latest posts by laurentjacob.bsky.social on Bluesky

@laurentjacob is following 20 prominent accounts