alternative view
18.03.2025 13:59 β π 2 π 0 π¬ 0 π 0@enjalot.bsky.social
Data Visualization and Machine Learning Building Latent Scope to visualize unstructured data through the lens of ML github.com/enjalot/latent-scope
alternative view
18.03.2025 13:59 β π 2 π 0 π¬ 0 π 0mean pooling
18.03.2025 13:59 β π 1 π 0 π¬ 1 π 0I'm interested in chatting about some data vis work!
03.02.2025 13:52 β π 5 π 0 π¬ 0 π 0implemented a new rendering component for latent scope's scatter plot. had to replace regl-scatterplot with d3-zoom + regl shaders so we could support mobile
23.01.2025 00:37 β π 7 π 1 π¬ 0 π 0She is interested in policy, and it sounds like the potential for adopting tech is both exciting and overwhelming. perhaps good examples where tech intervention has had clear benefits. I think plant health would be a great place to start!
18.01.2025 13:00 β π 2 π 0 π¬ 0 π 0hi Gabriel, do you have any resources/reading to recommend? I have a friend working in Taiwan to improve local agriculture who's interested in learning about tech potential to help
09.01.2025 16:19 β π 0 π 0 π¬ 1 π 0I'll be at @unireps.bsky.social this Saturday presenting a new experimental pipeline to visually explore structured neural network representations. The core idea is to take thousands of prompts that activate a concept, and then cluster and draw them using MultiDiffusion. π§΅π
11.12.2024 23:18 β π 31 π 8 π¬ 2 π 0is it the overhead of running / opening / managing notebook files via browser?
I found using notebooks in vscode (and cursor) got me over the hump of "just getting started" since I'm already in the ide so much
cool! what are you mapping exactly?
30.12.2024 06:44 β π 1 π 0 π¬ 1 π 0yes, i want to use it for storage but in order to do so i need to do this inefficient conversion. i'm wondering if there is a better choice to store with
10.12.2024 20:27 β π 0 π 0 π¬ 1 π 0I've been operating under assumption parquet is best way to store intermediate data, but now that I'm trying to handle incoming image data it feels a bit wasteful. especially since converting to bytes is only like 40 it/s
10.12.2024 19:51 β π 0 π 0 π¬ 1 π 0am i missing something for handling image data in parquet files?
I can load a dataset from HF like:
dataset = load_dataset("Marqo/marqo-ge-sample", split='google_shopping')
df = pd.DataFrame(dataset)
but i need to convert the images to bytes if I want to do:
df.to_parquet("sample.parquet")
βThey said it could not be doneβ. Weβre releasing Pleias 1.0, the first suite of models trained on open data (either permissibly licensed or uncopyrighted): Pleias-3b, Pleias-1b and Pleias-350m, all based on the two trillion tokens set from Common Corpus.
05.12.2024 16:39 β π 248 π 85 π¬ 11 π 19CLIP search for 562K maps in Lib of Congress github.com/j-mahowald/c... Paper: 2024.computational-humanities-research.org/papers/paper... #chr2024
05.12.2024 15:15 β π 27 π 5 π¬ 2 π 0the algorithm is not some deity but a landscape, the feed is an uber ride across the manifold, only the windows are blacked out. what if you had a map of the algorithm? what if the UX of the feed let you look out of the window?
musing with @infowetrust.com
image from distill.pub/2017/aia/
Spent the day playing with this. I'm absolutely blown away @enjalot.bsky.social!
- Chose any embedding from HF
- Project with UMAP, cluster with HDBSCAN
- Use Ollama to label the clusters (Works incredibly well!)
πππππ
03.12.2024 02:22 β π 2 π 0 π¬ 0 π 0what do you think about having cut out templates like this cool cone ornament coloring thing:
picklebums.com/paper-cone-c...
what's crazy to me is that so many of these can be run very efficiently on an M1 MacBook pro, and just fine on a VM with only CPU.
crazy how much value you can pull out of text without billions of parameters
If you're interested in embedding models for retrieval (search), clustering, classification, paraphrase mining, etc., then there's now 10,000 fully free and open source options on @hf.co via Sentence Transformers.
Check out the most popular ones here: huggingface.co/models?libra...
I've organized and participated in many unconferences in the past, and they are always the most intense exchange of ideas and information that I've experienced. Given the energy we're seeing in the registration this one is poised to be no different!
register today!
hiddenstates.org
After the morning keynotes we will have a short voting session where topics get put on the board and everyone gets a few votes. Then the most popular topics get assigned to different session times. We will have parallel tracks and breakout rooms for the niche topics with dedicated interest too.
26.11.2024 18:23 β π 0 π 0 π¬ 1 π 0There is lots of interest in steering and alignment by leveraging latest interpretability techniques like SAEs. Many people also brought up dimensionality reduction and visualization as well as better ways to extract structure from models.
So how will everyone get to talk about these topics?
The beauty of the unconf format is the self-organizing nature, people find each other based on common curiosities. We have noticed some themes in the topics shared during registration:
Lot's of people want to go beyond the chat interface, and there appear to be lots of ideas for how to do that.
First we've got 2 amazing keynote speakers to kick off the day: @lelandmcinnes.bsky.social and @thesephist.com
Leland has built indispensable tools for working with model internals, namely UMAP, HDBSCAN and DataMapPlot.
Linus has published inspiring design research interfacing with hidden states.
We've hit a critical mass of registrations! The caliber of attendees is exciting, we've got researchers from companies big and small, academic and indie. We've got prototypers and UXers who have worked on bleeding-edge interfaces as well as house-hold names.
let's talk about the unconf experience:
Hidden States is happening next week in SF!
It's a one-day unconference gathering researchers, designers, prototypers and engineers interested in pushing the boundaries of AI interfaces, going below the API and working with the hidden states.
hiddenstates.org
I've also made another tool for exploring unstructured text data (i.e. tweets) via a map of sorts:
enjalot.github.io/latent-scope...
If you do this with enough data you start to get a map of the patterns found in your dataset.
When you embed new data, like the question for a RAG query, you can see where on the map it lands.
You can map more and more points, a less similar point will show up a little further away.
As you add more points a map starts to form, with clusters of similar data spread out before you