"How I, a non-developer, read the tutorial you, a developer, wrote for me, a beginner" by Annie Mueller π
π π
anniemueller.com/posts/how-i-...
@nisanharamati.bsky.social
Data Systems for Infinite Scale, Math, Physics, Croissants. Founder.
"How I, a non-developer, read the tutorial you, a developer, wrote for me, a beginner" by Annie Mueller π
π π
anniemueller.com/posts/how-i-...
For the first time: @honeycomb.io is hiring open roles in Australia!!! We have this senior role open as well as a mid-level role. job-boards.greenhouse.io/honeycomb/jo...
Once we fill these, we will have a thriving APAC team of 5 people: Field CTO, account exec, customer architect, and 2 support.
The legacy observability vendors' obsession with "cardinality control" is so backwards.
Why *control* cardinality instead of *embracing* it? High-cardinality data isn't a bugβit's the entire point. Your complex systems generate complex data.
Stop building tools that fight reality.
#SemanticSearch #DataInfrastructure #SearchArchitecture
Our latest post, www.graphiumlabs.com/blog/end-of-..., discusses how current search systems and tools are falling apart as they are required to handle an ever growing mass of data, and an increasing level of nuance and complexity.
I think they might be coming back. Have you seen www.graphiumlabs.com?
06.06.2025 02:16 β π 4 π 3 π¬ 0 π 0This post breaks down why understanding precision and recall is essential when building search and information retrieval systems for high stakes decision making:
www.graphiumlabs.com/blog/precisi...
In high-stakes environments, like medical diagnostics, legal research, and threat detection, the trade off between high recall and high precision isnβt just a theoretical optimization problem. The choice has real-world consequences.
24.07.2025 18:44 β π 2 π 1 π¬ 1 π 0Ideally, users want both at 100% β all (good) signal, and zero noise. But the way search works under the hood often forces a trade off: higher recall requires looser filters to bring in more results, and consequentially, more irrelevant results or noise, which bring down precision.
24.07.2025 18:44 β π 2 π 1 π¬ 1 π 0Precision means: βOf the results that were returned, how many were relevant (correct)?β
And recall says: βOf all the correct results, how many were returned?β
In search and information retrieval systems, precision and recall are more than just evaluation metricsβthey reflect how well a system aligns with the userβs needs and expectations of relevance and completeness.
24.07.2025 18:44 β π 2 π 1 π¬ 1 π 0New Graphium Labs blog post!
www.graphiumlabs.com/blog/precisi...
#precision #recall #relevance #search #informationretrieval #searchengineering #searchsystems #searchquality #mlmetrics
So when nuance is important, semantic search built on vector similarity tends to miss the mark by a really, really wide margin.
01.07.2025 18:37 β π 3 π 0 π¬ 0 π 0I'll start: vector embeddings don't encode semantics, they encode substitutability. It _looks right_ if you squint at it, or if the use case is pretty trivial (e.g. "brown" vs. "chocolate" when describing a sofa).
But opposites also have high substituability (good/bad, dark/light, rich/poor, etc.)
Still frame from the movie The Princess Pride of Inigo Montoya saying "I do not think that word means what you think it means" to Vizzini, overlaid with the text "I do not think these words mean what you think they mean" near Inigo Montoya's head, and the text "Semantic search is just vector embeddings cosine similarity" near Vizzini's head.
It's been bothering me for years how "Semantic" in "Semantic search", the way it's built these days, is semantically wrong.
So on this quite lovely Canada day, let's argue semantics about "Semantic".
I once failed the "check the checkbox" test by checking it... Wrong? I guess?
30.04.2025 16:28 β π 2 π 0 π¬ 0 π 0Super excited to share this! I've known Saem for many years, and once we started talking about what we're building at Graphium Labs, having him join us as CEO felt inevitable.
17.04.2025 16:11 β π 2 π 0 π¬ 0 π 0I found the most incredible graph on the other site
13.04.2025 17:50 β π 3398 π 1016 π¬ 77 π 109This was a really fun talk to give. Thanks Kir Shatrov and Cameron Morgan for organizing, and @tavis.damnsimple.com for recording!
Video: m.youtube.com/watch?v=D4ZL...
Slides: www.graphiumlabs.com/vancouver-sy...
I love this paper!
07.04.2025 19:13 β π 1 π 0 π¬ 0 π 0New Change, Technically episode is out: WHO'S AFRAID OF MATH?
We tackle *math anxiety," @analog-ashley.bsky.social teaches me about vulnerable circuits in the brain and being vulnerable about teaching, and I read a HECK of a lot of science to bring you this episode
Why "geometric" is bad:
Geometric refers to a geometric sequence in math, of the form a, ar, ar^2, ar^3, ..., ar^n.
If r>1 and the scale of something grows by the power, you lose control FAST. Nuclear meltdown fast. 99.9999% of the increase occurs in the last microsecond.
Fine -> BAD happens fast
Hey Tim let's talk.
18.03.2025 03:15 β π 1 π 0 π¬ 0 π 0This was a really fun talk to write and present!
11.03.2025 20:23 β π 1 π 0 π¬ 0 π 0Co-founder @nisanharamati.bsky.social gave a talk at last night's Vancouver.systems , "The Limits of Scaling and the Physical Properties of Data" going over how to predict the size limit where distributed systems stop scaling and start losing throughput.
slides: www.graphiumlabs.com/vancouver-sy...
We don't talk enough about Scaling to Catastrophe in distributed systems. Today's post, part 2 in the Physical Properties of Data series, explores the different scaling phases through the lens and math of the Universal Scalability Law. www.graphiumlabs.com/blog/part2-g... #databs #dataengineering
25.02.2025 20:04 β π 4 π 2 π¬ 0 π 1bc i haven't done so yet, i decided to burn any remaining bridge to the land of statistics. it wasn't statisticians nor statistics but it was me. i am simply not good enough to do statistics myself.
so, @peyrardmax.bsky.social and i decided to turn statistical estimation into supervised learning.
Hey #PlatformEngineering folks (especially with Kafka experience!) - how would you like to be the new Terra at @honeycomb.io? They are hiring a Staff Platform Engineer to backfill for me (my last day is Friday) and you couldnβt ask for a better group of folks.
jobs.lever.co/honeycomb/4f...
There's a few tickets left for the distributed systems class coming up in just over a week. If you'd like to join, now's the time. :-)
https://www.eventbrite.com/e/distributed-systems-fundamentals-registration-1060426286569?aff=mastodon
"GiganticDataStore |>" is like 99.999% of the engineering effort for this
04.12.2024 17:56 β π 1 π 0 π¬ 1 π 0Similarity measurement is the key element in recommendation systems: which entities or objects in your dataset are similar to others, and by how much, is the engine that drives recommendation systems
Read more in our latest blog post at www.graphiumlabs.com/blog/similar...
#databs #dataengineering