Jack Vanlightly's Avatar

Jack Vanlightly

@vanlightly.bsky.social

Researcher, advisor, writer, formal verification eng @ Confluent. Everything data (dist sys, databases, messaging, data eng/analytics). https://jack-vanlightly.com, https://www.hotds.dev Credit: ESO/B. Tafresh

3,765 Followers  |  111 Following  |  273 Posts  |  Joined: 24.10.2024  |  2.0011

Latest posts by vanlightly.bsky.social on Bluesky

Which will be proprietary platform stuff I assume, so not going into the Iceberg spec? Is there any chance of caching layers also being open and standardised?

09.10.2025 05:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Beyond Indexes: How Open Table Formats Optimize Query Performance โ€” Jack Vanlightly My career in data started as a SQL Server performance specialist, which meant I was deep into the nuances of indexes, locking and blocking, execution plan analysis and query design. These days Iโ€™m mor...

Why donโ€™t Iceberg or Delta Lake have secondary indexes?
Because analytics workloads and OLTP workloads optimize for opposite I/O patterns.

See my dive into data layout, pruning, and what โ€œindexingโ€ really means in open table formats: jack-vanlightly.com/blog/2025/10...

08.10.2025 12:59 โ€” ๐Ÿ‘ 15    ๐Ÿ” 2    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
Understanding Apache Fluss โ€” Jack Vanlightly This is a data system internals blog post. So if you enjoyed my table formats internals blog posts , or writing on Apache Kafka internals or Apache BookKeeper internals , you might enjoy thi...

New deep dive: Understanding Apache Fluss

I spent August reverse-engineering Fluss, Alibabaโ€™s new table storage engine for Flink (partially forked from Kafka). This post covers its architecture, tiering, and how it tackles changelogs & low-latency state.

jack-vanlightly.com/blog/2025/9/...

02.09.2025 12:55 โ€” ๐Ÿ‘ 15    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
A Conceptual Model for Storage Unification โ€” Jack Vanlightly Object storage is taking over more of the data stack, but low-latency systems still need separate hot-data storage. Storage unification is about presenting these heterogeneous storage systems and form...

New blog post: A Conceptual Model for Storage Unification.

The post defines what storage unification means, defines terminology and evaluates different building blocks and approaches to doing it.

jack-vanlightly.com/blog/2025/8/...

21.08.2025 13:15 โ€” ๐Ÿ‘ 8    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Remediation: What happens after AI goes wrong? โ€” Jack Vanlightly If youโ€™re following the world of AI right now, no doubt you saw Jason Lemkinโ€™s post on social media reporting how Replitโ€™s AI deleted his production database , despite it being told not to touch an...

In a future of autonomous AI agents, we can't limit ourselves to error prevention and error detection, we must also include remediation.

jack-vanlightly.com/blog/2025/7/...

28.07.2025 12:16 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Ha! The Blondlot example is fascinating. Sometimes you can fail so spectacularly that they have to invent new math controls just to prevent future people from making a mistake as bad as yours. That's a special kind of immortality.

22.07.2025 17:53 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
The Cost of Being Wrong โ€” Jack Vanlightly A recent LinkedIn post by Nick Lebesis caught my attention with this brutal take on the difference between good startup founders and coward startup founders. I recommend you read the entire thing ...

Science moves slowly because wrong theories waste decades. Engineering is careful because failures kill people. Software moves fast because mistakes are cheap, the expensive error isn't making the wrong choice, it's taking too long to make any choice. jack-vanlightly.com/blog/2025/7/...

22.07.2025 15:08 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

But no, I was not offended ๐Ÿ˜„ Keep on writing it how you see it!

16.07.2025 11:57 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

He could easily have framed it in a more positive way. But either way, I don't think it matters too much, your readers are intelligent and can benefit from both, and see the criticism from a more positive angle (picking the bits they like from both).

16.07.2025 11:57 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Your post was valid from that scale. Winston's response took issue with it because he's been thinking very deeply at the macro-scale of power systems that run our entire planet. And I appreciate his views there. But, I think it was an unfair take out and needlessly combative.

16.07.2025 11:57 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I think it's a matter of scale. For me, your post is from the position where you are at day-to-day, your subjective experience, helping run a company, pushing back on some of the BS you see in the industry (which we love!), all inextricably linked to your past.

16.07.2025 11:57 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Responsibility Boundaries in the Coordinated Progress model โ€” Jack Vanlightly Building on my previous work on the Coordinated Progress model, this post examines how reliable triggers not only initiate work but also establish responsibility boundaries . Where a reliable tri...

Where does reliability begin, and where does it end? In distributed business architectures, the answer is responsibility boundaries. New post: jack-vanlightly.com/blog/2025/7/...

15.07.2025 14:15 โ€” ๐Ÿ‘ 12    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Next time, I'll agree it's Wednesday ๐Ÿ˜†

03.07.2025 20:16 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

ChatGPT thought it was Tuesday, so I made fun of it and it admitted it was Wednesday. So I made fun of it again, and it admitted it was...Wednesday. But sure, AI agents are gonna steal my job ๐Ÿค”

03.07.2025 16:20 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Like how to write an if statement or loop in bash ๐Ÿ˜„ I swear my brain is incapable of remembering that.

01.07.2025 08:28 โ€” ๐Ÿ‘ 9    ๐Ÿ” 0    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1

It really makes me question how ready it is for autonomous agents. I'm still on the "I'll believe it when I see it" camp for AI agents.

24.06.2025 18:30 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

ChatGPT has hallucinated so many times for me today. It's invented scientific terms that don't exist, has been quite liberal with plausible answers based on what sounds reasonable, but without any real world justification. When challenged, it admits it's mistake.

24.06.2025 18:30 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

At this point, I can't tell if coffee makes me feel better in the morning because I am in withdrawal, or it actually picks me up.

20.06.2025 08:48 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

My musical evolution continues, discovered deep hypnotic drone music today. No drugs required ๐Ÿ˜„ The Hypnus Records label is great.

13.06.2025 14:33 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The abstraction seems to make sense to people. And it isn't complicated either which is nice.

12.06.2025 06:21 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Coordinated Progress โ€“ Part 1 โ€“ Seeing the System: The Graph โ€” Jack Vanlightly At some point, weโ€™ve all sat in an architecture meeting where someone asks, โ€œ Should this be an event? An RPC? A queue? โ€, or โ€œ How do we tie this process together across our microservices? Should it ...

How to reliably distribute work across microservices, stream processors, durable execution, event-driven, orchestration and now AI agents?

Coordinated Progress is a 4 part series that explores the common structure behind reliable distributed systems.

jack-vanlightly.com/blog/2025/6/...

11.06.2025 14:28 โ€” ๐Ÿ‘ 33    ๐Ÿ” 8    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

I took a break from social media and my blog for a couple of months. ND burnout. But I'm tentatively back, probably just to post my writing here for now. HOTDS is on pause. Getting back to writing is therapeutic though. I'll post something this week that I've been working on.

09.06.2025 11:23 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Agree, it's not the Hz but any kind of calming sound like pouring rain, or these slow atmospheric noises is what does it for me. It quietens my brain. In fact, since I discovered this, I haven't listened to music once. I only listen to these calming tracks now.

09.06.2025 11:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

My pleasure ๐Ÿ˜„

04.04.2025 18:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Humans of the Data Sphere Issue #10 April 4th 2025 Your biweekly dose of insights, observations, commentary and opinions from interesting people from the world of databases, AI, streaming, distributed systems and the data engineering/analytics space.

Another Humans of the Data Sphere is out, with issue 10! In this issue people are talking fsyncs, tips for running ClickHouse at scale, the problems with MCP and more. Plus I dig up a classic paper from 1962. www.hotds.dev/p/humans-of-...

04.04.2025 16:14 โ€” ๐Ÿ‘ 5    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Just an oversight I guess.

03.04.2025 17:54 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Apache Kafka Apache Kafka: A Distributed Streaming Platform.

kafka.apache.org/blog#apache_...

03.04.2025 15:59 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Proud to have contributed formal verification (TLA+) for three key improvements in Kafka 4.0:

โœ… KIP-966: Strengthens the replication protocol.
โœ… KIP-996: Introduces PreVote for more stable KRaft leadership.
โœ… KIP-848: Delivers more efficient, predictable rebalancing.

03.04.2025 15:59 โ€” ๐Ÿ‘ 18    ๐Ÿ” 1    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

I just selected the Spotify Gamma Waves 40Hz playlist. Listen to it on good head phones for the immersion.

25.03.2025 14:49 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Wow, I just discovered gamma wave music. Wrote non-stop for three hours.

25.03.2025 13:00 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@vanlightly is following 19 prominent accounts