Leland McInnes's Avatar

Leland McInnes

@lelandmcinnes.bsky.social

A Mathematician dabbling in Data Science, especially unsupervised learning and data exploration. UMAP, HDBSCAN, PyNNDescent, DataMapPlot. (He/Him)

2,692 Followers  |  298 Following  |  43 Posts  |  Joined: 25.03.2024
Posts Following

Posts by Leland McInnes (@lelandmcinnes.bsky.social)

Video thumbnail

playing around with umap-js today.

27.02.2026 17:59 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
State of Neuroscience 2025: Trends & Breakthroughs | The Transmitter A comprehensive look at major trends shaping the neuroscience landscape in 2025

Excited to share a new #interactive #dataviz, putting 50 years of #neuroscience research on the map:

The State of Neuroscience 2025
stateofneuroscience.thetransmitter.org

#StateOfNeuro

17.11.2025 15:41 β€” πŸ‘ 50    πŸ” 15    πŸ’¬ 4    πŸ“Œ 2

Any sufficiently large k-nn is indistinguishable from magic
πŸ§™β€β™‚οΈ

28.02.2026 06:43 β€” πŸ‘ 22    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
screenshot of map.sky.boo with a circle around a dot way off the main clustered area, a modal at the bottom right shows it as "Justin Kyle" daddysgaygarden13.bsky.social

screenshot of map.sky.boo with a circle around a dot way off the main clustered area, a modal at the bottom right shows it as "Justin Kyle" daddysgaygarden13.bsky.social

become unclusterable

10.02.2026 08:03 β€” πŸ‘ 76    πŸ” 4    πŸ’¬ 2    πŸ“Œ 1

Some very nice ideas in there. Thanks for sharing!

10.02.2026 15:24 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Bluesky Map Interactive map of 3.4 million Bluesky users, visualised by their follower pattern.

I made a map of 3.4 million Bluesky users - see if you can find yourself!

bluesky-map.theo.io

I've seen some similar projects, but IMO this seems to better capture some of the fine-grained detail

08.02.2026 22:59 β€” πŸ‘ 7197    πŸ” 2158    πŸ’¬ 658    πŸ“Œ 4582
A) Dendrogram of the development dataset showing the clustering structure and optimal cut points, and spectrograms of representative calls extracted from cluster 0 and cluster 1. Within the main
clusters, we observed further branching; B) UMAP projection divided into 𝐾 = 2 clusters using HAC.

A) Dendrogram of the development dataset showing the clustering structure and optimal cut points, and spectrograms of representative calls extracted from cluster 0 and cluster 1. Within the main clusters, we observed further branching; B) UMAP projection divided into 𝐾 = 2 clusters using HAC.

Our new pre-print shows how unsupervised clustering methods can identify biologically meaningful differences in early vocal production, with no human feedback. @antorrisi.bsky.social
has led this interdisciplinary collaboration based on computational methods + #chicks 🐣 arxiv.org/abs/2601.12203

24.01.2026 13:15 β€” πŸ‘ 20    πŸ” 8    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

here's a fun side project i've been working on: i compiled a joint text<>audio embedding model to a fast coreml pipeline, and built a very fast (~400ms for 50k samples, can scale to millions) UMAP dimensionality reduction GPU impl in mlx. using it to browse music libraries and do sample sim search

26.01.2026 06:25 β€” πŸ‘ 64    πŸ” 4    πŸ’¬ 3    πŸ“Œ 0

Xiaobin Li, Run Zhang: Understanding and Improving UMAP with Geometric and Topological Priors: The JORC-UMAP Algorithm https://arxiv.org/abs/2601.16552 https://arxiv.org/pdf/2601.16552 https://arxiv.org/html/2601.16552

27.01.2026 06:33 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

I miss the days where you'd see blogposts with clever analyses on datasets, maths and data science tricks.

That's why, as an experiment, we're starting a new moderated subreddit. People can share/promote their notebooks and you can use RSS to subscribe.

Please join and share!

25.01.2026 23:00 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
3 by 3 grid of networks color-mapped with "plasma" and with edge-bundling. Lung looking structure.

3 by 3 grid of networks color-mapped with "plasma" and with edge-bundling. Lung looking structure.

UMAP connectivity plots of 3,627 chess openings from the @lichess.org datasets (huggingface.co/datasets/Lic...)

20.01.2026 12:21 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

If I had to guess a direction that could be taken that would ameliorate this, it would be "small models". It feels like it should be possible to have small enough models to run locally that are either "capable enough", or specialize in a domain. In that case training is the only compute bottleneck.

05.01.2026 15:00 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I think it's important to note though that in spite of those incentives, the direction of the last two years has been more fungibility, *not* lock-in. And open source is the wrong fight here: when lock-in comes it will look more like the lock-in that Amazon or Uber have than Microsoft Office…

05.01.2026 13:37 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 2    πŸ“Œ 1
Preview
Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality Reduction Fuzzy simplicial sets have become an object of interest in dimensionality reduction and manifold learning, most prominently through their role in UMAP. However, their definition through tools from alg...

New preprint! Have you ever wondered, what are these fuzzy simplicial sets, the theoretical framework behind e.g. UMAP? Here we show that you may simply see them as marginal distributions over simplicial sets. This provides a generative model for UMAP. (1/2)

arxiv.org/abs/2512.03899

04.12.2025 12:31 β€” πŸ‘ 14    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Space DJ turns genre embeddings into a playable galaxyβ€”pilot a ship, the music follows. πŸš€

Key stats
768β†’128 PCA compression; 3D UMAP projection; three.js rendering; autopilot drift; high‑dim neighbors surfacing hidden similarities.

11.11.2025 15:03 β€” πŸ‘ 29    πŸ” 8    πŸ’¬ 3    πŸ“Œ 2

Since 2018 if I recall correctly.

04.11.2025 03:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

via the magic of laion_clap embeddings and umap, my live coding thingy has a sample browser at last!

31.10.2025 18:27 β€” πŸ‘ 51    πŸ” 11    πŸ’¬ 7    πŸ“Œ 0
Scatterplot of a document corpus with cluster information in the form of colors and cluster labels. Includes labels like computing, mental health, religion, etc.

Scatterplot of a document corpus with cluster information in the form of colors and cluster labels. Includes labels like computing, mental health, religion, etc.

I made this annotated scatter plot of 1 million FineWeb-Edu documents for @sashamtl.bsky.social's new TED talk.

31.10.2025 14:52 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Also really love how organic the plot looks with "inferno" (left) and "viridis" (right).

27.10.2025 10:42 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
organic looking graph of the BGP nodes of the internet. black and white

organic looking graph of the BGP nodes of the internet. black and white

Map of the internet: 1.3M nodes (BGP)

26.10.2025 13:39 β€” πŸ‘ 30    πŸ” 6    πŸ’¬ 4    πŸ“Œ 2
Leland McInnes - DataMapPlot: Rich Tools for UMAP | SciPy 2025
YouTube video by SciPy Leland McInnes - DataMapPlot: Rich Tools for UMAP | SciPy 2025

The video of my talk at SciPy on DataMapPlot is up at last. If you make t-SNE or UMAP plots the talk provides some guidance on how to make plots most effective, and introduces a library to help make that easier.

www.youtube.com/watch?v=-iBh...

17.10.2025 13:56 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
A line chart captioned "The big humanities majors were mostly still falling in 2024", showing drops since 2008 for most humanities fields between 10% (Study of the Arts) to 68% (religion) with history, english, and foreign languages all clustered around 50-55%

A line chart captioned "The big humanities majors were mostly still falling in 2024", showing drops since 2008 for most humanities fields between 10% (Study of the Arts) to 68% (religion) with history, english, and foreign languages all clustered around 50-55%

Despite the gutting of the National Center for Educational Statistics, the dept of Ed *did* manage to release 2024 college major counts in the usual format, so I can run it through the same code I do every year. First off, the change since peak of the largest fields -- another year of drops.

28.09.2025 02:20 β€” πŸ‘ 55    πŸ” 13    πŸ’¬ 3    πŸ“Œ 4
A Bluffer's Guide to Dimension Reduction - Leland McInnes
YouTube video by PyData A Bluffer's Guide to Dimension Reduction - Leland McInnes

I'm very much a learner, but you're maybe asking if aspects of matrix factorisation approaches to dimensionality reduction apply here. But LocalMAP is a KNN approach, with a matrix factorisation initialisation. h/t @lelandmcinnes.bsky.social for his attempts to describe these youtu.be/9iol3Lk6kyU

26.09.2025 14:42 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ“’ Save the date!
Join us for the next @ellis.eu x UniReps Speaker Series!
πŸ“… 27th August – 16:00 CEST
πŸ“https://ethz.zoom.us/j/66426188160
πŸŽ™οΈ Speakers: Keynote by @lelandmcinnes.bsky.social & Flash Talk by Yu (Demi) Qin
πŸ”” Stay updated by joining our Google group: groups.google.com/u/2/g/ellis-...

14.08.2025 07:58 β€” πŸ‘ 10    πŸ” 6    πŸ’¬ 0    πŸ“Œ 3
Screenshot of embedding atlas showing the embedding view on the left, a table at the bottom and charts on the right.

Screenshot of embedding atlas showing the embedding view on the left, a table at the bottom and charts on the right.

πŸš€ We've just open-sourced Embedding Atlas – a tool for exploring large embedding spaces through rich, interactive visualizations πŸ“Š.

01.08.2025 08:24 β€” πŸ‘ 117    πŸ” 33    πŸ’¬ 4    πŸ“Œ 4
Figure 1

Figure 1

Figure 2

Figure 2

Figure 3

Figure 3

Figure 4

Figure 4

Meteoroid stream identification with HDBSCAN unsupervised clustering algorithm. Eloy PeΓ±a-Asensio et. al. https://arxiv.org/abs/2507.01501

03.07.2025 07:46 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Ever wanted to pan through the latent🌌 space of TikTok videos? Made using the amazing toponymy and datamapplot from @lelandmcinnes.bsky.social
and data from mine and @jurgenpfeffer.bsky.social
's first complete TikTok slice. link below

11.07.2025 16:45 β€” πŸ‘ 11    πŸ” 6    πŸ’¬ 2    πŸ“Œ 0
Post image

🎀 Speaker Spotlight: Leland McInnes
Join Leland at #SciPy2025 for his talk "DataMapPlot: Rich Tools for UMAP Visualizations." πŸ“Š

Discover powerful new ways to explore high-dimensional data!
πŸ”— scipy2025.scipy.org

05.07.2025 19:46 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Explore Wikipedia through a data map. Pages are grouped by semantic similarity, for topic clusters.
Hover to see details, zoom to explore more fine-grained topics, click to go to a page. Search by page
name to find interesting starting points for exploration.

lmcinnes.github.io/datamapplot_...

22.06.2025 15:36 β€” πŸ‘ 116    πŸ” 49    πŸ’¬ 7    πŸ“Œ 8