F3: The Open-Source Data File Format for the Future
SIGMOD 2025
Our SIGMOD paper with our friends at Tsinghua + @wesmckinney.com + @pateljm.bsky.social on creating a next generation open-source data file format is out. F3 is a future-proof file format avoids the mistakes of Parquet.
π Paper: db.cs.cmu.edu/papers/2025/...
π Code: github.com/future-file-...
01.10.2025 13:49 β π 70 π 21 π¬ 4 π 5
Why, hello there! πͺ
Thanks @rbn.bsky.social :D
16.09.2025 19:07 β π 2 π 1 π¬ 0 π 0
I have an Aer one. You can get decent Tomtoc ones which mimic the designs of popular brands for less $ - unsure in the quality but I imagine they are decent.
31.08.2025 23:11 β π 3 π 0 π¬ 1 π 0
Async/await was a developer experience regression compared with blocking APIs, but a huge advancement over the APIs it replaced, which were callback based.
I'm optimistic about virtual threads as a best of both worlds, if we can retain the benefits of futures & structured concurrency
04.08.2025 17:02 β π 7 π 0 π¬ 1 π 0
Ah, thanks. How much of it is attributable to BEAM itself, I wonder. Using AOT instead of the JIT cuts memory usage in .NET by about half, for ex. IIRC, the numbers below are steady state under load, not startup, but either way
04.08.2025 16:42 β π 1 π 0 π¬ 0 π 0
Thanks. Perhaps there is still low hanging fruit for elixir/phoenix/BEAM when it comes to reducing startup mem consumption. I'm sure it would still consume more than a Rust app given the nature of the runtimes
04.08.2025 16:30 β π 0 π 0 π¬ 0 π 0
What is MB RES?
04.08.2025 16:14 β π 0 π 0 π¬ 2 π 0
Pat Helland delivered a presentation on this paper: hpts.ws/papers/2024/...
09.04.2025 16:06 β π 14 π 4 π¬ 1 π 0
In the last week I probably explained Rateless Set Reconciliation to a dozen other scientists. What an amazing paper and result, and already one year old.
07.04.2025 18:55 β π 51 π 6 π¬ 6 π 2
It's just a toy / proof-of-concept, really, but it might be useful to refer to.
03.03.2025 17:51 β π 0 π 0 π¬ 0 π 0
Reuben Bond - Orleans under the hood (Dotnetos Conference 2021)
YouTube video by Dotnetos
Thank you, Jeremy! I gave a presentation about some of the design considerations in the serialization & RPC system: www.youtube.com/live/kgRag4E...
29.01.2025 04:32 β π 4 π 0 π¬ 1 π 0
All things agents. I'm interested in builder communities rather than per-product servers
04.01.2025 22:29 β π 1 π 0 π¬ 1 π 0
What's a good discord server for people building things in the AI/LLM space?
04.01.2025 22:08 β π 7 π 0 π¬ 2 π 0
Committing the first value is safe as-is, but by playing more with this I believe subsequent Fast Rounds require additional safety rules:
1. Proposers only propose updates to known-committed values
2. Acceptors require the value's version (akin to slot number) has increased or values are identical
03.01.2025 17:33 β π 0 π 0 π¬ 1 π 0
Did he have his onion juice with him or a couple of pocket eggs? If not, he's ngmi
03.01.2025 07:18 β π 0 π 0 π¬ 1 π 0
gotta max out those T levels somehow
03.01.2025 04:19 β π 0 π 0 π¬ 1 π 0
Feel free to dm
16.12.2024 20:38 β π 0 π 0 π¬ 0 π 0
Is there a doc describing it?
15.12.2024 18:54 β π 0 π 0 π¬ 1 π 0
Link?
15.12.2024 18:43 β π 0 π 0 π¬ 2 π 0
"nothing more" is a bit too far: CRDTs have limitations on their behavior. You can't implement just anything using CRDTs. Their utility is quite limited and hence they are not widely deployed within datacenter based apps (I know of none, and no one has provided examples yet)
15.12.2024 18:43 β π 0 π 0 π¬ 0 π 0
It depends on the CRDT, but that's not my area of expertise. The journaled grains use event sourcing (modified for geo distributed environments), and I believe you could implement CRDTs on top fairly easily, but the API also exposes coordination/sync operations for consistency.
15.12.2024 18:01 β π 0 π 0 π¬ 2 π 0
What I described is not a CRDT. I am talking about ACID database transactions using optimistic concurrency control
15.12.2024 17:10 β π 0 π 0 π¬ 1 π 0
You can version the data and check for conflicts at commit time, similar to a database transaction with optimistic concurrency control.
15.12.2024 16:48 β π 0 π 0 π¬ 1 π 0
In the online case you can coordinate changes. In the offline case you cannot: you have divergent data replicas being updated independently. The changes need to be merged eventually and there is ambiguity as to how.
15.12.2024 16:41 β π 0 π 0 π¬ 1 π 0
That's CRDT territory, but the goalposts just shifted
15.12.2024 16:23 β π 0 π 0 π¬ 1 π 0
Interesting workshop and a lovely community, would very much recommend!
15.12.2024 16:04 β π 8 π 7 π¬ 1 π 0
The sources can change but each one is authoritative (eg for fx rates) so you can assign a sequence number to each update which can be used to ensure consistency
15.12.2024 16:14 β π 0 π 0 π¬ 1 π 0
One way would be to leave the import in a staging area until the user has completed their part of the workflow against a point in time snapshot. There is nothing you can do with a CRDT here which you cannot do without - there's no replication in this scenario
15.12.2024 15:42 β π 0 π 0 π¬ 1 π 0
Why makes you say it's impossible without CRDTs? I don't see why it would be any more complex without CRDTs
15.12.2024 15:19 β π 0 π 0 π¬ 1 π 0
Hi everyone! I'm a co-founder @dbos.dev, where we're building a serverless platform for highly reliable applications. I love conversations about databases, distributed systems, and anything technical. Thanks @qianli.dev for introducing me to Bluesky, and looking forward to meeting people here! π¦
13.12.2024 00:58 β π 31 π 4 π¬ 0 π 2
An event dedicated to #rustlang & its community π¦ Taking place from Sept 8-11 in Montreal, Canada in 2026!
rustconf.com
Reporter @theregister.com covering enterprise applications, databases and analytics. Also, a bit of science here and there. Many former lives.
No free will, decent buoyancy control
Product lead for Azure SDK at Microsoft; Queen of Corner Cases; Purdue & Stanford alum; Space gEEk and Future Astronaut
PL fool and DB theory nerd. Working on the C++ Address Sanitizer for Windows these days (dynamic analysis for memory safety), still a Durable Functions nerd.
Building meshweaver.cloud, tech enthusiast, OG software architect. CEO / CTO
Co-founder @ http://dbos.dev β’ Stanford PhD β’ Database Geek β’ Building https://github.com/dbos-inc/dbos-transact-py
Software Engineer at MSFT. Opinions are my own.
I like to understand how things work
Technology-focused link-aggregation community.
Using /newest.rss feed. Not affiliated with lobste.rs.
Currently beta; by @mycroft.mkz.me
Husband, Father, Senior Program Manager on the #Copilot extensibility team @ #Microsoft. My posts and thoughts are my own.
Building distributed systems and data infra.
Previously co-creator of Apache Flink (https://flink.apache.org/),
now building Restate (https://restate.dev/) to make distributed apps more easily resilient and scalable.
The Proceedings of the VLDB Endowment (PVLDB)
https://vldb.org/pvldb/
RSS Feed: https://db.cs.cmu.edu/files/rss/pvldb-rss.xml
Automated by @andypavlo.bsky.social
Partner research manager in the data systems group at Microsoft Research. Interests: storage, caching, streaming, analytics, key-value stores, ML for systems.
Compiler/runtime engineer at Microsoft. Currently: CLR. Past: JS/Chakra, C#, DLR. Interests: Computer Games, AI, running, hiking
Co-founder and CEO of Hopsworks. Organizer of the feature store summit. I am writing a book on Building ML Systems for O'Reilly.