playing around with umap-js today.
27.02.2026 17:59 β π 3 π 1 π¬ 0 π 0playing around with umap-js today.
27.02.2026 17:59 β π 3 π 1 π¬ 0 π 0
Excited to share a new #interactive #dataviz, putting 50 years of #neuroscience research on the map:
The State of Neuroscience 2025
stateofneuroscience.thetransmitter.org
#StateOfNeuro
Any sufficiently large k-nn is indistinguishable from magic
π§ββοΈ
screenshot of map.sky.boo with a circle around a dot way off the main clustered area, a modal at the bottom right shows it as "Justin Kyle" daddysgaygarden13.bsky.social
become unclusterable
10.02.2026 08:03 β π 76 π 4 π¬ 2 π 1Some very nice ideas in there. Thanks for sharing!
10.02.2026 15:24 β π 4 π 0 π¬ 1 π 0
I made a map of 3.4 million Bluesky users - see if you can find yourself!
bluesky-map.theo.io
I've seen some similar projects, but IMO this seems to better capture some of the fine-grained detail
A) Dendrogram of the development dataset showing the clustering structure and optimal cut points, and spectrograms of representative calls extracted from cluster 0 and cluster 1. Within the main clusters, we observed further branching; B) UMAP projection divided into πΎ = 2 clusters using HAC.
Our new pre-print shows how unsupervised clustering methods can identify biologically meaningful differences in early vocal production, with no human feedback. @antorrisi.bsky.social
has led this interdisciplinary collaboration based on computational methods + #chicks π£ arxiv.org/abs/2601.12203
here's a fun side project i've been working on: i compiled a joint text<>audio embedding model to a fast coreml pipeline, and built a very fast (~400ms for 50k samples, can scale to millions) UMAP dimensionality reduction GPU impl in mlx. using it to browse music libraries and do sample sim search
26.01.2026 06:25 β π 64 π 4 π¬ 3 π 0Xiaobin Li, Run Zhang: Understanding and Improving UMAP with Geometric and Topological Priors: The JORC-UMAP Algorithm https://arxiv.org/abs/2601.16552 https://arxiv.org/pdf/2601.16552 https://arxiv.org/html/2601.16552
27.01.2026 06:33 β π 1 π 1 π¬ 0 π 0
I miss the days where you'd see blogposts with clever analyses on datasets, maths and data science tricks.
That's why, as an experiment, we're starting a new moderated subreddit. People can share/promote their notebooks and you can use RSS to subscribe.
Please join and share!
3 by 3 grid of networks color-mapped with "plasma" and with edge-bundling. Lung looking structure.
UMAP connectivity plots of 3,627 chess openings from the @lichess.org datasets (huggingface.co/datasets/Lic...)
20.01.2026 12:21 β π 6 π 1 π¬ 1 π 0If I had to guess a direction that could be taken that would ameliorate this, it would be "small models". It feels like it should be possible to have small enough models to run locally that are either "capable enough", or specialize in a domain. In that case training is the only compute bottleneck.
05.01.2026 15:00 β π 3 π 0 π¬ 0 π 0I think it's important to note though that in spite of those incentives, the direction of the last two years has been more fungibility, *not* lock-in. And open source is the wrong fight here: when lock-in comes it will look more like the lock-in that Amazon or Uber have than Microsoft Officeβ¦
05.01.2026 13:37 β π 8 π 1 π¬ 2 π 1
New preprint! Have you ever wondered, what are these fuzzy simplicial sets, the theoretical framework behind e.g. UMAP? Here we show that you may simply see them as marginal distributions over simplicial sets. This provides a generative model for UMAP. (1/2)
arxiv.org/abs/2512.03899
Space DJ turns genre embeddings into a playable galaxyβpilot a ship, the music follows. π
Key stats
768β128 PCA compression; 3D UMAP projection; three.js rendering; autopilot drift; highβdim neighbors surfacing hidden similarities.
Since 2018 if I recall correctly.
04.11.2025 03:00 β π 1 π 0 π¬ 1 π 0via the magic of laion_clap embeddings and umap, my live coding thingy has a sample browser at last!
31.10.2025 18:27 β π 51 π 11 π¬ 7 π 0Scatterplot of a document corpus with cluster information in the form of colors and cluster labels. Includes labels like computing, mental health, religion, etc.
I made this annotated scatter plot of 1 million FineWeb-Edu documents for @sashamtl.bsky.social's new TED talk.
31.10.2025 14:52 β π 4 π 1 π¬ 1 π 0Also really love how organic the plot looks with "inferno" (left) and "viridis" (right).
27.10.2025 10:42 β π 4 π 1 π¬ 0 π 0organic looking graph of the BGP nodes of the internet. black and white
Map of the internet: 1.3M nodes (BGP)
26.10.2025 13:39 β π 30 π 6 π¬ 4 π 2
The video of my talk at SciPy on DataMapPlot is up at last. If you make t-SNE or UMAP plots the talk provides some guidance on how to make plots most effective, and introduces a library to help make that easier.
www.youtube.com/watch?v=-iBh...
A line chart captioned "The big humanities majors were mostly still falling in 2024", showing drops since 2008 for most humanities fields between 10% (Study of the Arts) to 68% (religion) with history, english, and foreign languages all clustered around 50-55%
Despite the gutting of the National Center for Educational Statistics, the dept of Ed *did* manage to release 2024 college major counts in the usual format, so I can run it through the same code I do every year. First off, the change since peak of the largest fields -- another year of drops.
28.09.2025 02:20 β π 55 π 13 π¬ 3 π 4I'm very much a learner, but you're maybe asking if aspects of matrix factorisation approaches to dimensionality reduction apply here. But LocalMAP is a KNN approach, with a matrix factorisation initialisation. h/t @lelandmcinnes.bsky.social for his attempts to describe these youtu.be/9iol3Lk6kyU
26.09.2025 14:42 β π 5 π 2 π¬ 1 π 0
π’ Save the date!
Join us for the next @ellis.eu x UniReps Speaker Series!
π
27th August β 16:00 CEST
πhttps://ethz.zoom.us/j/66426188160
ποΈ Speakers: Keynote by @lelandmcinnes.bsky.social & Flash Talk by Yu (Demi) Qin
π Stay updated by joining our Google group: groups.google.com/u/2/g/ellis-...
Screenshot of embedding atlas showing the embedding view on the left, a table at the bottom and charts on the right.
π We've just open-sourced Embedding Atlas β a tool for exploring large embedding spaces through rich, interactive visualizations π.
01.08.2025 08:24 β π 117 π 33 π¬ 4 π 4Figure 1
Figure 2
Figure 3
Figure 4
Meteoroid stream identification with HDBSCAN unsupervised clustering algorithm. Eloy PeΓ±a-Asensio et. al. https://arxiv.org/abs/2507.01501
03.07.2025 07:46 β π 1 π 1 π¬ 0 π 0
Ever wanted to pan through the latentπ space of TikTok videos? Made using the amazing toponymy and datamapplot from @lelandmcinnes.bsky.social
and data from mine and @jurgenpfeffer.bsky.social
's first complete TikTok slice. link below
π€ Speaker Spotlight: Leland McInnes
Join Leland at #SciPy2025 for his talk "DataMapPlot: Rich Tools for UMAP Visualizations." π
Discover powerful new ways to explore high-dimensional data!
π scipy2025.scipy.org
Explore Wikipedia through a data map. Pages are grouped by semantic similarity, for topic clusters.
Hover to see details, zoom to explore more fine-grained topics, click to go to a page. Search by page
name to find interesting starting points for exploration.
lmcinnes.github.io/datamapplot_...