Same for me, I am finding more vibrant communities on the data lake side of things, Iceberg/DataFusion/Arrow and all the Postgres extensions have been interesting me more these days
08.11.2025 22:34 — 👍 0 🔁 0 💬 1 📌 0@otavioccc.bsky.social
Yet another software bricklayer.
Same for me, I am finding more vibrant communities on the data lake side of things, Iceberg/DataFusion/Arrow and all the Postgres extensions have been interesting me more these days
08.11.2025 22:34 — 👍 0 🔁 0 💬 1 📌 0It's true. Every company eventually becomes a database company.
05.08.2025 21:24 — 👍 10 🔁 1 💬 1 📌 0Intro to @apachedatafusion.bsky.social : Technology, Community and Not Quite Enough Time: www.youtube.com/watch?v=3per...
15.06.2025 09:31 — 👍 6 🔁 4 💬 0 📌 0Last month I worked on Apache Iceberg integration for Vortex, the results of which we presented earlier this month at Iceberg Summit.
I wrote a post about my experience bridging our Rust-based system to Iceberg and Spark
spiraldb.com/post/vortex-...
The recording of my keynote at Data Council 2025 is now available.
If you missed my best talk so far, you can now catch up.
The Deconstructed Database and the Advent of the Open Data Lake
www.youtube.com/watch?v=Cqhk...
It’s not pretty, but it’s been working for a few years now. We added some ugly logical sharding on top — a poor people’s version of consistent hashing that I designed and that I’m definitely not proud of (but that works and "scales").
09.06.2025 08:33 — 👍 1 🔁 0 💬 0 📌 0Neo4j is arguably the leading product in the graph database ecosystem, but it faces challenges with high-throughput write workloads. For our use case (metadata in an observability company), we ultimately chose Postgres with recursive queries and restricted the set of traversal queries we support.
09.06.2025 08:27 — 👍 1 🔁 0 💬 0 📌 0Complex traversal queries don’t play well with high-throughput writes, and sharding or partitioning is also non-trivial. In my view (and personal suffering), it’s a difficult domain for a one-size-fits-all database, so we naturally ended up with a universe of purpose-specific graph databases.
09.06.2025 08:20 — 👍 3 🔁 0 💬 2 📌 0Sunday morning read
jepsen.io/analyses/t...
A parody of John Waters' "Serial Mom", except it's "Serializable Mom". I'm holding scissors (to partition the network) and trying to channel my best homage to Kathleen Turner.
Systems Distributed. June 19-20, Amsterdam.
https://systemsdistributed.com/
This post from @stephanewen.bsky.social is *really* thought provoking. Their architecture seems to unify both database and stream processing architecture into one thing.
I'm not totally sure what the implications of this are yet, but it seems important.
Is this where we currently are?
13.03.2025 14:40 — 👍 5 🔁 2 💬 0 📌 0It's widely known that sharing a queue across multiple servers, rather than queue-per-server, often helps reduce latency and improve utilization. But when is one queue better? In my new blog post, I look at one case: different classes of work. Read it here: brooker.co.za/blog/2025/03...
26.03.2025 16:34 — 👍 26 🔁 2 💬 0 📌 4Kinda related: research.google/blog/deciphe...
26.03.2025 17:03 — 👍 0 🔁 0 💬 0 📌 0The interesting part to me is not how they answer questions. The interesting part is that maybe they will end up proving we are not very complex machines ourselves.
Have you thought about it? Maybe we are just probabilistic dummy generation machines too.
en.m.wikipedia.org/wiki/The_Lib...
There is a whole field of Neuro-symbolic AI that will likely give us more insights in the future. This is like Shannon's work to me, that was being used before he was able to properly formalize it.
25.03.2025 09:49 — 👍 1 🔁 0 💬 0 📌 0The interesting part to me is not how they answer questions. The interesting part is that maybe they will end up proving we are not very complex machines ourselves.
Have you thought about it? Maybe we are just probabilistic dummy generation machines too.
en.m.wikipedia.org/wiki/The_Lib...
I was trying to reach out to blog.acolyer.org and realized Adrian retired and it is now offline. Although I am happy for him (and he totally deserves to pass the torch on), what a great piece of content we just lost...
02.02.2025 20:01 — 👍 0 🔁 0 💬 0 📌 0I was recently on a panel on "What are important data systems problems, ignored by research?" with @andypavlo.bsky.social and Allison Lee moderated by Viktor Leis - here is the write-up of the discussion databasearchitects.blogspot.com/2024/12/what...
13.12.2024 07:58 — 👍 64 🔁 12 💬 2 📌 2Serious question: Are we moving towards a Tuple Spaces revival in Distributed Systems practice?
I see everyone building on top of S3/Object Storage and nobody discussing about it at research level (or mentioning any potential links between both topics).
inria.hal.science/hal-01631715...
Great talk from Justin Cormack - Object Storage Is All You Need.
It’s widespread at Grafana Labs - Mimir, Loki, Tempo and Pyroscope all follow the pattern.
youtu.be/ei0wwTy6_G4
While the Lakehouse hype and overall Stream Processing adoption are great to watch (after all those years doing and discussing about it), I'm increasingly concerned about engineering collaboration models and potential overuse.
Anyone else? What are you seeing in your organizations?
My #SRECon talk is now out! Watch me destroy some earbuds on stage and learn how software engineering compares to other branches of engineering
www.youtube.com/watch?v=xNeS...
A short analysis of what might have contributed to the recent rise of BlueSky. Now, on the Blogs section at Communications of the ACM cacm.acm.org/blogcacm/the...
27.11.2024 18:42 — 👍 20 🔁 6 💬 1 📌 1Do you want to learn how @restatedev.bsky.social works under the hood and how its distributed architecture is designed? Then this video is for you 🫵 It also includes a first demo of how it runs distributed and can cope with node failures 💪
🎥 www.youtube.com/watch?v=Lgu3...
wrote up a reply to @dustyweb.bsky.social's "How decentralized is Bluesky really" blog post
27.11.2024 00:28 — 👍 578 🔁 171 💬 37 📌 35The story of how SlateDB came to be. Couldn’t have happened without reponsive.dev. And now they’re launching RS3, a streaming storage backend, built on SlateDB. 🤩
26.11.2024 17:46 — 👍 16 🔁 4 💬 0 📌 0I’m fortunate to have a call with the Bluesky backend team for an hour a week, and I’m continually blown away at how good they are at engineering their systems, how thoughtful they are about protocol design, and how smoothly they’ve been handling a huge influx of new users. Seriously impressive
13.11.2024 22:23 — 👍 1251 🔁 127 💬 26 📌 18New blog post on caching in DataFusion! See how my research is advancing DataFusion’s capabilities and what’s next:
blog.haoxp.xyz/posts/cachin...
I wrote a short tutorial on using Porcupine to check for linearizability (without needing to deal with the JVM).
notes.eatonphil.com/2024-10-31-c...