Looking forward to this!
25.10.2025 13:22 β π 2 π 0 π¬ 0 π 0@frankmcsherry.bsky.social
http://github.com/frankmcsherry/blog
Looking forward to this!
25.10.2025 13:22 β π 2 π 0 π¬ 0 π 0New from Materialize: Cloud M.1 Clusters
Run 3x larger workloads with the same low latency and predictable performanceβthanks to intelligent data spilling and expanded capacity.
Learn more: bit.ly/3L12oH2
One thing I like is that I stopped being a bad researcher and .. did some reading rather than only writing. There are some papers about columnar Datalog already (links at the end)! However, I don't think I fully understand them yet. More reading required!
09.10.2025 18:33 β π 1 π 0 π¬ 0 π 0Datatoad check-in: this time including some recent progress on columnar joins (good news: faster). Though, it also tries to roll up a bit of the sprawl of content I've scribbled, which increasingly feels like it needs some more careful curation to be helpful.
github.com/frankmcsherr...
Good news on the Datalog front: v1 of "columnar joins" seem to work, and resulted in a 20% improvement (from 9.5s to 7.5s, for the joins of a reference workload). Still more gains from tightening it up, and potentially from columnar sorting, but I'll take a swing at writing things up tomorrow!
08.10.2025 22:35 β π 3 π 0 π¬ 0 π 0Runtimes of the same application with two different allocators; mimalloc is nearly 100x faster.
What a difference an allocator makes!
This is the same Rust program first using the system allocator, and then using mimalloc. About 100MB of working set in both cases, just .. apparently it pilots the system allocator to some horrible behavior.
Obviously going to start using mimalloc from now on.
Welcome Frank McSherry @frankmcsherry.bsky.social to Sync Conf 2025. Pioneer of sync technology, inventor of Differential Dataflow, and founder of @materialize.com, Frank will trace the evolution of sync and stream processing.
19.09.2025 14:30 β π 9 π 3 π¬ 0 π 0Highlighting some of my team's recent work: We've changed Materialize to use swap instead of memory-mapped files, with nice performance and efficiency improvements.
18.09.2025 14:24 β π 3 π 1 π¬ 0 π 0Weβve released a major improvement to our memory spilling infrastructure:
Materialize now uses swap to scale SQL workloads beyond RAM.
β
Faster hydration
β
Efficient memory utilization
β
Bigger workloads supported
Full post from antiguru.bsky.social β bit.ly/46EF2iJ
Very excited to bring some column-orientation to timely and differential. At least, removing baked in row-orientation in timely, and actual column-orientation in differential, with a bunch of cool learnings from the datatoad work. I hope. We'll see. :D
15.09.2025 21:29 β π 10 π 1 π¬ 0 π 0We just released Timely Dataflow 0.24! Here are some exciting changes from @frankmcsherry.bsky.social and myself.
The container abstractions got a complete rework, and we introduce a new pattern to distribute data. Details below.
github.com/TimelyDatafl...
The closest I've found is github.com/kparc/ksimple which I'll work through, but .. most of the write-ups seem to conflate the esoteria of monadic/dyadic glyphs with the implementation architecture you might use, and .. boy do I want to read about the latter without trudging through the former.
29.08.2025 17:27 β π 0 π 0 π¬ 0 π 0I have a trip coming up, and I'm hoping to find some content to read about the implementations of (ideally interpreted) array languages. I'm on an interpreter kick, and armed with a bunch of column-oriented libraries.
Any tips, drop a reply!
I wrote a bit about datatoad's columnar logic for relational operators. At least, for union, intersection, antijoins, and semijoins. It turns out the joins are all easy; it's projection that is hard, of all things. Go figure.
github.com/frankmcsherr...
Notion's new offline support is based on our rich text CRDT research x.com/ivanhzhao/st...
19.08.2025 19:55 β π 117 π 4 π¬ 2 π 0If you are in SF in November, I'll be speaking at syncconf.dev (@syncconf.bsky.social)!
It's an excellent confluence of all things up-to-data. Architectures like MZ at the backend, connected via sync engines, and front ends that don't waste anyone's time waiting on database queries.
It turns out that datatoad seems to be *faster*, at least once the data are loaded in. The interpreted reading of the file input, using regexes, does seem to be a fair bit faster in datafrog. But then the evaluation gets done almost a second faster with the toad.
Weird times, folks.
The datafrog numbers, with the same plan (we think) hardcoded in, and explicit u32 types everywhere, gives us ...
14.08.2025 23:10 β π 1 π 0 π¬ 1 π 0The datatoad numbers have lightly improved while we have been working the tries for the Galen queries, now at around 10s (formerly, like .. ten minutes? oh those days of easy improvements).
14.08.2025 23:08 β π 0 π 0 π¬ 1 π 0In Datalog news: I had given up on getting (compiled) datafrog numbers for the "alias analysis" problem, because it is tedious to write. But thanks to an anonymous benefactor, it was coded up and we can now make a comparison between compiled datafrog and interpreted datatoad, on the same problem!
14.08.2025 23:07 β π 4 π 0 π¬ 1 π 0I wrote about the projects done at Materializeβs recent hackathon. Many very cool projects, and also one that I worked on; take a read!
materialize.com/blog/spring_...
A neat new Materialize post from our QA department on speeding up CI. materialize.com/blog/speedin...
08.08.2025 21:22 β π 3 π 0 π¬ 0 π 0Datalog weekend: we graduate to queries with cyclic rules, non-binary relations, and generally more interesting behavior. In particular, we're going to compare ourselves against interpreted SoufflΓ©; a standard reference point!
How does interpreted datatoad compare?
github.com/frankmcsherr...
Hat tip to @bkirwi.bsky.social for writing it!
01.08.2025 01:39 β π 2 π 0 π¬ 0 π 0We have a new @materialize.com post up, this time about pushing selection predicates into our persistence layer.
Materialize hybridizes batch and streaming computation (it does both, regularly), and draws on the best optimizations of each (in this case, CDWs).
materialize.com/blog/how-fil...
I found Jetstream to be very pleasant to use, as well! I recommend it for others with streaming social interests!
29.07.2025 13:30 β π 4 π 0 π¬ 0 π 0Nice that the Bluesky firehose is now becoming a live dataset on which to demo streaming databases
29.07.2025 11:25 β π 38 π 9 π¬ 2 π 0I'll likely noodle on the implementation for a bit, as I think there's another 2x to be had through tweaking the join implementation. But .. what's next?
On the menu: 1. spill to disk, 2. scaling horizontally, 3. worst-case optimal joins, and 4. taking the text to an mdbook because omfg can't read.
We have a weekend Datalog update: datatoad now comes with tries! I took the long route to get here, and have a bit of other work in the pipeline as part of all this. But I'm happy to report that memory went down by another 2x (expected), and runtime went down by ~1.6x.
github.com/frankmcsherr...
Although it's a lot of data that comes in, Materialize's architecture lands the data in object storage in the cloud, and then brings to the compute resources only what they need. In this case, the views either window the data (with a temporal filter) or reduce it enough (to just urls) to fit in RAM.
17.07.2025 19:42 β π 0 π 0 π¬ 0 π 0