Distributed Systems's Avatar

Distributed Systems

@distribsystems.bsky.social

I tweet/retweet interesting stuff about #DistributedSystems and #compsci. Suggest links/papers/conversations via DM! Tag me for retweets. Run by @fponzi.me https://distsys.fponzi.me/

73 Followers  |  1 Following  |  44 Posts  |  Joined: 19.01.2025  |  1.6031

Latest posts by distribsystems.bsky.social on Bluesky

Preview
Reproducing the AWS Outage Race Condition with a Model Checker | Waqas Younas' blog Welcome to Waqas' blog

Reproducing the AWS Outage Race Condition with a Model Checker
wyounas.github.io/aws/concurre...
We’ll use a model checker to see how such a race could happen. Formal verification can’t prevent every failure, but it helps us think more clearly about correctness and reason about subtle bugs.

10.11.2025 12:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
TLA+ Modeling of AWS outage DNS race condition On Oct 19–20, 2025, AWS’s N. Virginia region suffered a major DynamoDB outage triggered by a DNS automation defect that broke endpoint resol...

TLA+ Modeling of AWS outage DNS race condition
muratbuffalo.blogspot.com/2025/11/tla-...
AWS’s N. Virginia region suffered a DynamoDB outage triggered by a DNS automation defect.This post focuses narrowly on the race condition at the core of the bug, which is best understood through TLA+ modeling

06.11.2025 12:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

TernFS β€” an exabyte scale, multi-region distributed filesystem
www.xtxmarkets.com/tech/2025-te...
This post motivates TernFS, explains its high-level architecture, and then explores some key implementation details.

03.11.2025 12:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Just make it scale: An Aurora DSQL story AWS Senior Principal Engineers, Niko Matsakis and Marc Bowes, take us inside Aurora DSQL's development: scaling write operations without two-phase commit, overcoming garbage collection hurdles, and…

Just make it scale: An Aurora DSQL story
www.allthingsdistributed.com/2025/05/just...
Each component follows the Unix mantraβ€”do one thing, and do it wellβ€”but working together they are able to offer all the features users expect from a database.

27.10.2025 12:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Aurora DSQL: How authentication and authorization works In this article, I’m going to explain how connections to Aurora DSQL are authenticated and authorized. This information is meant to be supplemental to what is found in the official Amazon Aurora DSQL…

Aurora DSQL: How authentication and authorization works
marc-bowes.com/dsql-auth.html
How connections to Aurora DSQL are authenticated and authorized. This information is meant to be supplemental to what is found in the official Amazon Aurora DSQL documentation.

20.10.2025 11:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Dynamo, DynamoDB, and Aurora DSQL - Marc's Blog Names are hard, ok?

Dynamo, DynamoDB, and Aurora DSQL
brooker.co.za/blog/2025/08...
People often ask me about the architectural relationship between Amazon Dynamo, Amazon DynamoDB and Aurora DSQL. I’ll start off on comparing how the systems achieve a few key properties.

14.10.2025 18:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Linearizability testing S2 with deterministic simulation
s2.dev/blog/lineari...
We can gain confidence that S2 is linearizable by taking an empirical validation approach, using a model checker like Knossos, or Porcupine.

30.09.2025 11:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

How I solved a distributed queue problem after 15 years
dbos.dev/blog/durable...
What we really needed to make distributed task queueing robust are durable queues that checkpoint the status of our queued tasks to a durable store like Postgres.

22.09.2025 18:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Understanding Paxos the intuitive way
relentless-leader.com/dive-deep-in...

09.08.2025 14:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Murat Demirbas and Aleksey Charapko read and discuss the HotOS paper""Real Life Is Uncertain. Consensus Should Be Too!"

31.07.2025 21:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Learning about distributed systems: where to start?
muratbuffalo.blogspot.com/2020/06/lear...
A principled, from the foundations-up, studying of distributed systems, which will take a good three months in the first pass, and many more months to build competence after that.

30.05.2025 11:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

FLP Result: Impossibility of Distributed Consensus with One Faulty Process (1985)
groups.csail.mit.edu/tds/papers/L...

29.05.2025 11:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Just make it scale: An Aurora DSQL story
www.allthingsdistributed.com/2025/05/just...
a few weeks ago, at our internal dev conference I watched a talk from two of our PEs on building DSQL. I asked if they’d be willing to turn their insights into a deeper exploration of DSQL’s development.

28.05.2025 11:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Reasoning about Distributed Protocols with Smart Casual Verification
decentralizedthoughts.github.io/2025-05-23-s...
Reasoning about distributed algorithms is hard at the best of times, with state split across remote nodes, asynchrony, concurrency, and non-determinism in the order that event occur

27.05.2025 11:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Apache Iceberg Internals Dive Deep On Performance
relentless-leader.com/apache-icebe...
Apache Iceberg is an ACID table format designed for large-scale analytics workloads.

15.05.2025 11:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Concurrency bugs in Lucene: How to fix optimistic concurrency failures
www.elastic.co/search-labs/...
Debugging concurrency bugs is no picnic, but we're going to get into it. Enter Fray, a deterministic concurrency testing framework that turns flaky failures into reproducible ones.

12.05.2025 11:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Erlang’s not about lightweight processes and message passing…
stevana.github.io/erlangs_not_...
To me it’s clear that the big idea there isn’t lightweight processes2 and message passing, but rather the generic components which in Erlang are called behaviours.

09.05.2025 11:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

So, You Want to Learn More About Deterministic Simulation Testing?
pierrezemb.fr/posts/learn-...
A curated collection of resources about deterministic simulation testing for distributed systems.

08.05.2025 11:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

May thy bits chip and shatter: Patterns for Building High-Performance Observability Pipelines at Scale
sumercip.com/posts/patter...

07.05.2025 11:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Parallel, Concurrent and Distributed Programming
ilyasergey.net/YSC4231/
This course on basic concurrent and parallel algorithms has been taught by Ilya Sergey at Yale-NUS College in 2019-2024.

06.05.2025 11:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Systems Correctness Practices at AWS: Leveraging Formal and Semi-formal Methods
dl.acm.org/doi/10.1145/...

05.05.2025 11:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Distributed consensus
shachaf.net/w/consensus
This page is a relatively informal discussion of distributed consensus and Paxos, what it does, how it works, and some tricks and variants.

28.04.2025 11:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Why is the raft consensus algorithm called "raft"?
groups.google.com/g/raft-dev/c...

25.04.2025 11:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Three Clocks are Better than One Insights, updates, and technical deep dives on building a high-performance financial transactions database.

Three Clocks are Better than One
tigerbeetle.com/blog/2021-08...
CLOCK_MONOTONIC_RAW, CLOCK_MONOTONIC and CLOCK_BOOTTIME, all monotonic clock stopwatches provided by the Linux kernel through the clock_gettime(2) syscall to measure elapsed time

22.04.2025 11:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Building a modern Durable Execution Engine from First Principles
restate.dev/blog/buildin...
We built a precursor and from all the lessons learned there, we arrived at a design with a self-contained complete stack, centered around a command log and event-processor, shipping as a single Rust binary

21.04.2025 17:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Decomposing Transactional Systems
transactional.blog/blog/2025-de...
Every transactional system does four things: execute, orders, validate and persists transactions.
All four of these things must be done before the system may acknowledge a transaction’s result to a client.

18.04.2025 11:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

How crawlers impact the operations of the Wikimedia projects
diff.wikimedia.org/2025/04/01/h...
Since the beginning of 2024, the demand for the content created by the Wikimedia volunteer community – especially for the 144 million images, videos, and other files on Wikimedia Commons – has grown.

15.04.2025 11:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Memcached: VerifyThis Long-term Challenge
verifythis.github.io/ltc/03memcac...
VerifyThis Long-Term Challenge aims at proving that deductive program verification can produce relevant results for real systems with acceptable effort on a large scale in a collaborative manner.

10.04.2025 11:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Testing Distributed Systems
asatarin.github.io/testing-dist...

01.04.2025 18:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

How concurrency works: A visual guide
wyounas.github.io/concurrency/...
Concurrent programming is hard.

24.03.2025 22:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@distribsystems is following 1 prominent accounts