Jack Vanlightly's Avatar

Jack Vanlightly

@vanlightly.bsky.social

Researcher, advisor, writer, formal verification eng @ Confluent. Everything data (dist sys, databases, messaging, data eng/analytics). https://jack-vanlightly.com, https://www.hotds.dev Credit: ESO/B. Tafresh

3,804 Followers  |  111 Following  |  282 Posts  |  Joined: 24.10.2024  |  2.371

Latest posts by vanlightly.bsky.social on Bluesky

Preview
The Three Durable Function Forms — Jack Vanlightly Durable execution engines (DEEs) talk about “workflows”, “activities”, “virtual objects”, “handlers”, and “functions”, but they’re often describing the same underlying execution patterns. This post pr...

Durable functions have many names across frameworks, but it reduces to 3 forms:
stateless functions, sessions, actors.

Explainer here:
jack-vanlightly.com/blog/2025/12...

10.12.2025 14:06 — 👍 11    🔁 0    💬 0    📌 0
Preview
The Durable Function Tree - Part 1 — Jack Vanlightly In my last post I wrote a bout why and where determinism is needed in durable execution (DE). In this post I'm going to explore how workflows can be formed from trees of durable function calls ba...

New post: The Durable Function Tree.
Durable execution engines all end up building some form of function tree with suspension points shaped by local vs remote side effects. I look at why, the trade-offs, and where orchestration should (and shouldn’t) be used.
jack-vanlightly.com/blog/2025/12...

04.12.2025 14:48 — 👍 14    🔁 1    💬 0    📌 0
Post image

📝 Blogged: "On Idempotency Keys"

Discussing several options for ensuring exactly-once processing in distributed systems using idempotency keys, from UUIDs to monotonically increasing sequences.

👉 www.morling.dev/blog/on-idem...

25.11.2025 16:38 — 👍 38    🔁 9    💬 2    📌 0
Preview
Demystifying Determinism in Durable Execution — Jack Vanlightly Determinism is a key concept to understand when writing code using durable execution frameworks such as Temporal, Restate, DBOS, and Resonate. If you read the docs you see that some parts of your code...

New blog post: Demystifying Determinism in Durable Execution

Why do durable execution frameworks care so much about determinism? I unpack the underlying mechanics.

Post: jack-vanlightly.com/blog/2025/11...

24.11.2025 14:02 — 👍 9    🔁 0    💬 0    📌 1
Preview
Have your Iceberg Cubed, Not Sorted: Meet Qbeast, the OTree Spatial Index — Jack Vanlightly In today’s post I want to walk through a fascinating indexing technique for data lakehouses which flips the role of the index in open table formats like Apache Iceberg and Delta Lake. We are going to...

New blog post about Qbeast and how it brings a multidimensional spatial index to Iceberg/Delta.

🔹 Hypercube-based layout
🔹 Index used by writers, invisible to engines
🔹 Better locality + pruning, adaptive layout

Lots of innovation ahead in lakehouses.

jack-vanlightly.com/blog/2025/11...

19.11.2025 13:55 — 👍 5    🔁 0    💬 1    📌 0
Preview
How Would You Like Your Iceberg Sir? Stream or Batch Ordered? — Jack Vanlightly Today I want to talk about stream analytics, batch analytics and Apache Iceberg. Stream and batch analytics work differently but both can be built on top of Iceberg, but due to their differences there...

Stream-order vs batch-order in Iceberg:
* Flink wants temporal locality.
* Spark wants value locality.

Same table, conflicting physics.

New post: jack-vanlightly.com/blog/2025/11...

05.11.2025 14:50 — 👍 13    🔁 2    💬 0    📌 0
Preview
A Fork in the Road: Deciding Kafka’s Diskless Future — Jack Vanlightly “ The Kafka community is currently seeing an unprecedented situation with three KIPs ( KIP-1150 , KIP-1176 , KIP-1183) simultaneously addressing the same challenge of high replica...

Three KIPs (1150, 1176, 1183) all target Kafka’s cross-AZ replication costs but there is a wider question at stake.

My new post explains the KIPs, the trade-offs between reusing old abstractions vs. embracing stateless compute over S3.

jack-vanlightly.com/blog/2025/10...

22.10.2025 12:50 — 👍 8    🔁 0    💬 0    📌 0

But it Vortex and the other new formats are on my radar

15.10.2025 15:45 — 👍 1    🔁 0    💬 0    📌 0

I don't know much about it, but even faster random reads won't help Kafka if the shared table is organized by some business dimensions rather than by Kafka offset.

15.10.2025 15:44 — 👍 1    🔁 0    💬 1    📌 0
Preview
Why I’m not a fan of zero-copy Apache Kafka-Apache Iceberg — Jack Vanlightly Over the past few months, I’ve seen a growing number of posts on social media promoting the idea of a “zero-copy” integration between Apache Kafka and Apache Iceberg. The idea is that Kafka topics cou...

New post: why I’m not a fan of “zero-copy” Iceberg tables for Apache Kafka.
From a systems design view, it trades storage savings for coupling and complexity.
Sometimes, duplication is cheaper than coupling.
jack-vanlightly.com/blog/2025/10...

15.10.2025 13:37 — 👍 17    🔁 5    💬 1    📌 0

Which will be proprietary platform stuff I assume, so not going into the Iceberg spec? Is there any chance of caching layers also being open and standardised?

09.10.2025 05:34 — 👍 0    🔁 0    💬 0    📌 0
Preview
Beyond Indexes: How Open Table Formats Optimize Query Performance — Jack Vanlightly My career in data started as a SQL Server performance specialist, which meant I was deep into the nuances of indexes, locking and blocking, execution plan analysis and query design. These days I’m mor...

Why don’t Iceberg or Delta Lake have secondary indexes?
Because analytics workloads and OLTP workloads optimize for opposite I/O patterns.

See my dive into data layout, pruning, and what “indexing” really means in open table formats: jack-vanlightly.com/blog/2025/10...

08.10.2025 12:59 — 👍 16    🔁 3    💬 2    📌 0
Preview
Understanding Apache Fluss — Jack Vanlightly This is a data system internals blog post. So if you enjoyed my table formats internals blog posts , or writing on Apache Kafka internals or Apache BookKeeper internals , you might enjoy thi...

New deep dive: Understanding Apache Fluss

I spent August reverse-engineering Fluss, Alibaba’s new table storage engine for Flink (partially forked from Kafka). This post covers its architecture, tiering, and how it tackles changelogs & low-latency state.

jack-vanlightly.com/blog/2025/9/...

02.09.2025 12:55 — 👍 16    🔁 2    💬 0    📌 0
Preview
A Conceptual Model for Storage Unification — Jack Vanlightly Object storage is taking over more of the data stack, but low-latency systems still need separate hot-data storage. Storage unification is about presenting these heterogeneous storage systems and form...

New blog post: A Conceptual Model for Storage Unification.

The post defines what storage unification means, defines terminology and evaluates different building blocks and approaches to doing it.

jack-vanlightly.com/blog/2025/8/...

21.08.2025 13:15 — 👍 7    🔁 2    💬 0    📌 0
Preview
Remediation: What happens after AI goes wrong? — Jack Vanlightly If you’re following the world of AI right now, no doubt you saw Jason Lemkin’s post on social media reporting how Replit’s AI deleted his production database , despite it being told not to touch an...

In a future of autonomous AI agents, we can't limit ourselves to error prevention and error detection, we must also include remediation.

jack-vanlightly.com/blog/2025/7/...

28.07.2025 12:16 — 👍 2    🔁 0    💬 0    📌 0

Ha! The Blondlot example is fascinating. Sometimes you can fail so spectacularly that they have to invent new math controls just to prevent future people from making a mistake as bad as yours. That's a special kind of immortality.

22.07.2025 17:53 — 👍 1    🔁 0    💬 0    📌 0
The Cost of Being Wrong — Jack Vanlightly A recent LinkedIn post by Nick Lebesis caught my attention with this brutal take on the difference between good startup founders and coward startup founders. I recommend you read the entire thing ...

Science moves slowly because wrong theories waste decades. Engineering is careful because failures kill people. Software moves fast because mistakes are cheap, the expensive error isn't making the wrong choice, it's taking too long to make any choice. jack-vanlightly.com/blog/2025/7/...

22.07.2025 15:08 — 👍 4    🔁 0    💬 0    📌 0

But no, I was not offended 😄 Keep on writing it how you see it!

16.07.2025 11:57 — 👍 0    🔁 0    💬 0    📌 0

He could easily have framed it in a more positive way. But either way, I don't think it matters too much, your readers are intelligent and can benefit from both, and see the criticism from a more positive angle (picking the bits they like from both).

16.07.2025 11:57 — 👍 0    🔁 0    💬 1    📌 0

Your post was valid from that scale. Winston's response took issue with it because he's been thinking very deeply at the macro-scale of power systems that run our entire planet. And I appreciate his views there. But, I think it was an unfair take out and needlessly combative.

16.07.2025 11:57 — 👍 0    🔁 0    💬 1    📌 0

I think it's a matter of scale. For me, your post is from the position where you are at day-to-day, your subjective experience, helping run a company, pushing back on some of the BS you see in the industry (which we love!), all inextricably linked to your past.

16.07.2025 11:57 — 👍 2    🔁 0    💬 1    📌 0
Preview
Responsibility Boundaries in the Coordinated Progress model — Jack Vanlightly Building on my previous work on the Coordinated Progress model, this post examines how reliable triggers not only initiate work but also establish responsibility boundaries . Where a reliable tri...

Where does reliability begin, and where does it end? In distributed business architectures, the answer is responsibility boundaries. New post: jack-vanlightly.com/blog/2025/7/...

15.07.2025 14:15 — 👍 12    🔁 5    💬 0    📌 0

Next time, I'll agree it's Wednesday 😆

03.07.2025 20:16 — 👍 2    🔁 0    💬 0    📌 0
Post image

ChatGPT thought it was Tuesday, so I made fun of it and it admitted it was Wednesday. So I made fun of it again, and it admitted it was...Wednesday. But sure, AI agents are gonna steal my job 🤔

03.07.2025 16:20 — 👍 3    🔁 0    💬 1    📌 0

Like how to write an if statement or loop in bash 😄 I swear my brain is incapable of remembering that.

01.07.2025 08:28 — 👍 8    🔁 0    💬 3    📌 1

It really makes me question how ready it is for autonomous agents. I'm still on the "I'll believe it when I see it" camp for AI agents.

24.06.2025 18:30 — 👍 6    🔁 0    💬 1    📌 0

ChatGPT has hallucinated so many times for me today. It's invented scientific terms that don't exist, has been quite liberal with plausible answers based on what sounds reasonable, but without any real world justification. When challenged, it admits it's mistake.

24.06.2025 18:30 — 👍 2    🔁 0    💬 1    📌 0

At this point, I can't tell if coffee makes me feel better in the morning because I am in withdrawal, or it actually picks me up.

20.06.2025 08:48 — 👍 3    🔁 0    💬 1    📌 0

My musical evolution continues, discovered deep hypnotic drone music today. No drugs required 😄 The Hypnus Records label is great.

13.06.2025 14:33 — 👍 3    🔁 0    💬 1    📌 0

The abstraction seems to make sense to people. And it isn't complicated either which is nice.

12.06.2025 06:21 — 👍 0    🔁 0    💬 0    📌 0

@vanlightly is following 20 prominent accounts