Dmitry Kobak's Avatar

Dmitry Kobak

@hippopedoid.bsky.social

Researcher at Tübingen University. Manifold learning, contrastive learning, scRNAseq data. Excess mortality. Born but to die and reas'ning but to err.

257 Followers  |  22 Following  |  2 Posts  |  Joined: 28.11.2024  |  1.6145

Latest posts by hippopedoid.bsky.social on Bluesky

Post image

We spent a year writing this review of low-dim embeddings and arguing about things like epistemic roles and best practices :-) 20+ authors are all participants of the Dagstuhl seminar we held last year: www.dagstuhl.de/24122. Led by @alexandr.bsky.social and Cyril de Bodt.

arxiv.org/abs/2508.15929

27.08.2025 15:14 — 👍 26    🔁 9    💬 1    📌 0
Recreated PCA figure from: P. Jolicoeur and J. E. Mosimann. Size and shape variation in the painted turtle. A principal component analysis.
Growth, 24:339–354, 1960.

The figures show PC1 vs PC2 and PC2 vs PC3, with colours and symbols reflecting the sex of the turtle.

Recreated PCA figure from: P. Jolicoeur and J. E. Mosimann. Size and shape variation in the painted turtle. A principal component analysis. Growth, 24:339–354, 1960. The figures show PC1 vs PC2 and PC2 vs PC3, with colours and symbols reflecting the sex of the turtle.

one of the more fun topics stemmed from a discussion with @hippopedoid.bsky.social over what the oldest 2D PCA visualization we could find was. After some scouring, we settled on a 1960 paper from researchers at the Université de Montréal about turtle carapaces, which we recreated.

27.08.2025 13:32 — 👍 7    🔁 2    💬 1    📌 0
The participants of Dagstuhl Seminar 24122 standing on steps outside (from https://www.dagstuhl.de/24122)

The participants of Dagstuhl Seminar 24122 standing on steps outside (from https://www.dagstuhl.de/24122)

Multiple types of embeddings (UMAP, t-SNE, Laplacian Eigenmaps, PHATE, PCA, MDS) of Wikipedia text data labelled by a text summaries generated by an LLM. Methods like UMAP and t-SNE show cluster structure that reflect shared subject matter in text, whiel other methods show more continuous structure.

Multiple types of embeddings (UMAP, t-SNE, Laplacian Eigenmaps, PHATE, PCA, MDS) of Wikipedia text data labelled by a text summaries generated by an LLM. Methods like UMAP and t-SNE show cluster structure that reflect shared subject matter in text, whiel other methods show more continuous structure.

Multiple embedding methods (PCA, Laplacian Eigenmaps, t-SNE, MDS, PHATE, UMAP) of primate brain organoids at different time periods. Different methods highlight different aspects of development, such as clusters of similar cell types or time courses of cell development.

Multiple embedding methods (PCA, Laplacian Eigenmaps, t-SNE, MDS, PHATE, UMAP) of primate brain organoids at different time periods. Different methods highlight different aspects of development, such as clusters of similar cell types or time courses of cell development.

Multiple embedding methods (PCA, Laplacian Eigenmaps, t-SNE, MDS, PHATE, UMAP) of 1000 Genomes Project genotypes. Different methods reflect different aspects of demographic history of populations.

Multiple embedding methods (PCA, Laplacian Eigenmaps, t-SNE, MDS, PHATE, UMAP) of 1000 Genomes Project genotypes. Different methods reflect different aspects of demographic history of populations.

Last year I met a bunch of great researchers who work with high-dimensional data at a Dagstuhl seminar. This week we put out a preprint about the history and philosophy of low-dimensional embedding methods, their applications, their challenges, and their possible future arxiv.org/abs/2508.15929

27.08.2025 13:25 — 👍 14    🔁 7    💬 1    📌 1

Hi Ben, I've been trying to contact you by email but keep failing. Wanted to DM you here but your DMs are closed. If you either follow me or open your DMs, I'll message you! Cheers.

24.04.2025 11:11 — 👍 0    🔁 0    💬 0    📌 0

@hippopedoid is following 20 prominent accounts