Frank McSherry's Avatar

Frank McSherry

@frankmcsherry.bsky.social

http://github.com/frankmcsherry/blog

323 Followers  |  5 Following  |  126 Posts  |  Joined: 28.10.2024  |  1.9792

Latest posts by frankmcsherry.bsky.social on Bluesky

Looking forward to this!

25.10.2025 13:22 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Introducing New Materialize Cloud M.1 Clusters Introducing a new Materialize Cloud cluster type. M.1 Clusters provide customers with more capacity, leading to better economics and performance, while maintaining the same low latency requirements th...

New from Materialize: Cloud M.1 Clusters
Run 3x larger workloads with the same low latency and predictable performanceβ€”thanks to intelligent data spilling and expanded capacity.
Learn more: bit.ly/3L12oH2

22.10.2025 19:52 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

One thing I like is that I stopped being a bad researcher and .. did some reading rather than only writing. There are some papers about columnar Datalog already (links at the end)! However, I don't think I fully understand them yet. More reading required!

09.10.2025 18:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Datatoad check-in: this time including some recent progress on columnar joins (good news: faster). Though, it also tries to roll up a bit of the sprawl of content I've scribbled, which increasingly feels like it needs some more careful curation to be helpful.
github.com/frankmcsherr...

09.10.2025 18:29 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Good news on the Datalog front: v1 of "columnar joins" seem to work, and resulted in a 20% improvement (from 9.5s to 7.5s, for the joins of a reference workload). Still more gains from tightening it up, and potentially from columnar sorting, but I'll take a swing at writing things up tomorrow!

08.10.2025 22:35 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Runtimes of the same application with two different allocators; mimalloc is nearly 100x faster.

Runtimes of the same application with two different allocators; mimalloc is nearly 100x faster.

What a difference an allocator makes!

This is the same Rust program first using the system allocator, and then using mimalloc. About 100MB of working set in both cases, just .. apparently it pilots the system allocator to some horrible behavior.

Obviously going to start using mimalloc from now on.

30.09.2025 10:19 β€” πŸ‘ 14    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Welcome Frank McSherry @frankmcsherry.bsky.social to Sync Conf 2025. Pioneer of sync technology, inventor of Differential Dataflow, and founder of @materialize.com, Frank will trace the evolution of sync and stream processing.

19.09.2025 14:30 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Highlighting some of my team's recent work: We've changed Materialize to use swap instead of memory-mapped files, with nice performance and efficiency improvements.

18.09.2025 14:24 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

We’ve released a major improvement to our memory spilling infrastructure:

Materialize now uses swap to scale SQL workloads beyond RAM.

βœ… Faster hydration

βœ… Efficient memory utilization

βœ… Bigger workloads supported

Full post from antiguru.bsky.social β†’ bit.ly/46EF2iJ

18.09.2025 13:58 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1

Very excited to bring some column-orientation to timely and differential. At least, removing baked in row-orientation in timely, and actual column-orientation in differential, with a bunch of cool learnings from the datatoad work. I hope. We'll see. :D

15.09.2025 21:29 β€” πŸ‘ 10    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Release timely-v0.24.0 Β· TimelyDataflow/timely-dataflow This version of Timely has some exciting new features. The Distributor trait offers a generalization of the Exchange type. It allows users to define custom distribution strategies for routing data...

We just released Timely Dataflow 0.24! Here are some exciting changes from @frankmcsherry.bsky.social and myself.
The container abstractions got a complete rework, and we introduce a new pattern to distribute data. Details below.
github.com/TimelyDatafl...

15.09.2025 19:44 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 2
Preview
GitHub - kparc/ksimple: k/simple is a bare minimum k interpreter for learning purposes by arthur whitney k/simple is a bare minimum k interpreter for learning purposes by arthur whitney - kparc/ksimple

The closest I've found is github.com/kparc/ksimple which I'll work through, but .. most of the write-ups seem to conflate the esoteria of monadic/dyadic glyphs with the implementation architecture you might use, and .. boy do I want to read about the latter without trudging through the former.

29.08.2025 17:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I have a trip coming up, and I'm hoping to find some content to read about the implementations of (ideally interpreted) array languages. I'm on an interpreter kick, and armed with a bunch of column-oriented libraries.

Any tips, drop a reply!

29.08.2025 17:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

I wrote a bit about datatoad's columnar logic for relational operators. At least, for union, intersection, antijoins, and semijoins. It turns out the joins are all easy; it's projection that is hard, of all things. Go figure.

github.com/frankmcsherr...

24.08.2025 20:20 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Ivan Zhao on X: "For those of local first nerds and @inkandswitch fans: This is the paper co-authored by @sliminality @geoffreylitt @pvh Martin Kleppmann https://t.co/FMhf4olmg4 Thank you for laying the technical foundation for block-based, rich text CRDT for the world." / X For those of local first nerds and @inkandswitch fans: This is the paper co-authored by @sliminality @geoffreylitt @pvh Martin Kleppmann https://t.co/FMhf4olmg4 Thank you for laying the technical foundation for block-based, rich text CRDT for the world.

Notion's new offline support is based on our rich text CRDT research x.com/ivanhzhao/st...

19.08.2025 19:55 β€” πŸ‘ 117    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0
Preview
Sync Conf | Nov 12, 2025 in San Francisco. Sync Conf is a boutique conference on the future of real-time, collaborative, agentic software development. Happening Nov 12, 2025 in San Francisco.

If you are in SF in November, I'll be speaking at syncconf.dev (@syncconf.bsky.social)!

It's an excellent confluence of all things up-to-data. Architectures like MZ at the backend, connected via sync engines, and front ends that don't waste anyone's time waiting on database queries.

19.08.2025 23:09 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

It turns out that datatoad seems to be *faster*, at least once the data are loaded in. The interpreted reading of the file input, using regexes, does seem to be a fair bit faster in datafrog. But then the evaluation gets done almost a second faster with the toad.

Weird times, folks.

14.08.2025 23:11 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

The datafrog numbers, with the same plan (we think) hardcoded in, and explicit u32 types everywhere, gives us ...

14.08.2025 23:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The datatoad numbers have lightly improved while we have been working the tries for the Galen queries, now at around 10s (formerly, like .. ten minutes? oh those days of easy improvements).

14.08.2025 23:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In Datalog news: I had given up on getting (compiled) datafrog numbers for the "alias analysis" problem, because it is tedious to write. But thanks to an anonymous benefactor, it was coded up and we can now make a comparison between compiled datafrog and interpreted datatoad, on the same problem!

14.08.2025 23:07 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I wrote about the projects done at Materialize’s recent hackathon. Many very cool projects, and also one that I worked on; take a read!

materialize.com/blog/spring_...

13.08.2025 21:59 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Speeding up Materialize CI How we slashed CI runtime for Materialize by up to 86% through smarter builds, caching, parallelization, and clever tooling.

A neat new Materialize post from our QA department on speeding up CI. materialize.com/blog/speedin...

08.08.2025 21:22 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Datalog weekend: we graduate to queries with cyclic rules, non-binary relations, and generally more interesting behavior. In particular, we're going to compare ourselves against interpreted SoufflΓ©; a standard reference point!

How does interpreted datatoad compare?

github.com/frankmcsherr...

02.08.2025 17:37 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Hat tip to @bkirwi.bsky.social for writing it!

01.08.2025 01:39 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
How filter pushdown works Using part statistics and abstract interpretation to push complex filters all the way down to the storage layer.

We have a new @materialize.com post up, this time about pushing selection predicates into our persistence layer.

Materialize hybridizes batch and streaming computation (it does both, regularly), and draws on the best optimizations of each (in this case, CDWs).

materialize.com/blog/how-fil...

31.07.2025 23:56 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I found Jetstream to be very pleasant to use, as well! I recommend it for others with streaming social interests!

29.07.2025 13:30 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Nice that the Bluesky firehose is now becoming a live dataset on which to demo streaming databases

29.07.2025 11:25 β€” πŸ‘ 38    πŸ” 9    πŸ’¬ 2    πŸ“Œ 0

I'll likely noodle on the implementation for a bit, as I think there's another 2x to be had through tweaking the join implementation. But .. what's next?

On the menu: 1. spill to disk, 2. scaling horizontally, 3. worst-case optimal joins, and 4. taking the text to an mdbook because omfg can't read.

20.07.2025 19:41 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We have a weekend Datalog update: datatoad now comes with tries! I took the long route to get here, and have a bit of other work in the pipeline as part of all this. But I'm happy to report that memory went down by another 2x (expected), and runtime went down by ~1.6x.

github.com/frankmcsherr...

20.07.2025 19:39 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Although it's a lot of data that comes in, Materialize's architecture lands the data in object storage in the cloud, and then brings to the compute resources only what they need. In this case, the views either window the data (with a temporal filter) or reduce it enough (to just urls) to fit in RAM.

17.07.2025 19:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@frankmcsherry is following 4 prominent accounts