Ilia Gusev

Ilia Gusev

@persikbl.bsky.social

Writing Podo Stack πŸ‡ - tools that survived production, weekly https://podostack.com

5 Followers 64 Following 230 Posts Joined Jan 2026
2 days ago

Two rules to make pause_minority work:

1. Odd number of nodes (3, 5, 7). Even numbers can deadlock - both halves think they're the minority.

2. Monitor rabbitmq_partitions metric. Partitions are network problems. Fix the network.

Full RabbitMQ production guide:
podostack.com/p/rabbitmq-...

0 0 0 0
2 days ago

Decision matrix:

"My messages can't be lost" -> pause_minority
"Uptime matters more than a few lost messages" -> autoheal
"I like chaos" -> ignore (please don't)

99% of teams should use pause_minority.

0 0 1 0
2 days ago

autoheal - the "move fast" choice.

Both sides run during the split (like ignore). When they reconnect, the side with fewer clients gets wiped and resynced.

Fast recovery. But messages from the losing side are gone forever.

0 0 2 0
2 days ago

pause_minority - the safe choice.

Nodes in the minority partition freeze. They stop accepting connections. Only the majority keeps running.

When the network heals, minority nodes sync and resume. No split-brain. No data loss.

0 0 1 0
2 days ago

cluster_partition_handling has 3 options:

ignore (default) - both sides keep running. Split-brain. Guaranteed data loss when they reconnect.

It's the default and it's the most dangerous. Let that sink in.

0 0 1 0
2 days ago

Network split in your RabbitMQ cluster. What happens next depends on one setting.

The default? The worst possible choice.

0 0 1 0
4 days ago

Top use cases we use policies for:

- DLX routing (dead letter handling)
- TTL enforcement (auto-expire stale messages)
- Queue length limits (protect against runaway producers)
- Quorum queue migration (switch type without code changes)

Full guide:
podostack.com/p/rabbitmq-...

0 0 0 0
4 days ago

One gotcha: client-side arguments beat policies.

If your app declares a queue with x-max-length=50, the policy's max-length is ignored for that queue.

Policies set defaults. Client args are overrides.

0 0 1 0
4 days ago

Priority system resolves conflicts:

Priority 0: global default (DLX for all)
Priority 10: group-specific (TTL for temp queues)
Priority 20: override (quorum for critical queues)

Higher number wins for each argument.

0 0 1 0
4 days ago

The real power is pattern matching.

"^temp\." gets TTL.
"^critical\." gets quorum type.
".*" is the global default.

Name your queues with prefixes and policies practically write themselves.

0 0 1 0
4 days ago

A policy is a server-side regex rule:

rabbitmqctl set_policy DLX ".*" \
'{"dead-letter-exchange":"dlx"}' \
--apply-to queues

Every queue now routes dead letters to your DLX. Applied instantly. Zero downtime.

0 0 1 0
4 days ago
Post image

Stop hardcoding TTL and DLX in your application code.

RabbitMQ policies do it better. One CLI command. No code change. No restart.

0 0 1 0
4 days ago

I learned this the hard way - a $2M revenue report attributed to the wrong state because someone ran an UPDATE instead of an INSERT.

Full guide with SQL examples and common mistakes:
podostack.com/p/slowly-ch...

0 0 0 0
4 days ago

Decision guide:

Type 1 for: typo fixes, attributes nobody reports on
Type 2 for: geography, category, status - anything you'd analyze over time
Type 6 for: when compliance needs both current AND historical views

Type 2 covers 80% of cases.

0 0 1 0
4 days ago

The trick is surrogate keys.

customer_key 1001 = NYC version
customer_key 1002 = Austin version
customer_id stays "CUST-42" on both

Facts point to the key, not the ID. That's how you preserve what was true at the time.

0 0 1 0
4 days ago

SCD Type 2 in action:

Customer moves from NYC to Austin.

Old row: NYC, valid_from 2022, valid_to 2025
New row: Austin, valid_from 2025, valid_to 9999

Past sales still show NYC. New sales show Austin. History intact.

0 0 1 0
4 days ago

Slowly Changing Dimensions (SCD) fix this.

6 types, but you really need to know 2:

Type 1: overwrite (simple, history gone)
Type 2: add a new row (history preserved)

0 0 1 0
4 days ago
Post image

Your data warehouse tracks what happened.

But does it track what CHANGED?

If you UPDATE customer.city in place, you just erased history. Every past sale now shows the new address.

0 0 1 0
4 days ago

Enable it:
rabbitmqctl enable_feature_flag quorum_queue_non_voters

It's transparent to your application code. Publishers and consumers don't know or care.

Deep dive with architecture diagrams:
podostack.com/p/rabbitmq-...

0 0 0 0
4 days ago

Best use cases:

- Geo-distributed clusters (replicas in every AZ, voting in the fast zone)
- Compliance requiring 5+ copies
- Hardware migrations without write degradation

0 0 1 0
4 days ago

If a voter goes down, a non-voter gets auto-promoted.

Your quorum recovers without operator intervention. The other non-voters keep holding their copies.

Same idea as ZooKeeper observers or etcd learners.

0 0 1 0
4 days ago

Non-voter replicas break this trade-off.

They receive the full replication stream. They store complete copies. But they don't participate in consensus voting.

7 replicas, 3 voters. Quorum stays at 2. Write speed stays fast.

0 0 1 0
4 days ago

Raft's scaling problem:

3 voters, quorum of 2 - fast.
5 voters, quorum of 3 - slower.
7 voters, quorum of 4 - noticeably slower.

Every voter adds a network round trip to every write. More durability = more latency.

0 0 1 0
4 days ago
Post image

You can replicate a RabbitMQ queue to 7 nodes without slowing down writes.

The trick: not all replicas need to vote.

1 0 1 0
5 days ago

Full breakdown with Mermaid diagrams, K8s YAML examples, and a decision matrix for when to use quorum vs streams vs classic.

Read the full issue:
podostack.com/p/rabbitmq-...

0 0 0 0
5 days ago

Network split in your cluster? The default config (ignore) gives you split-brain.

pause_minority is the only safe choice. Odd number of nodes. Always.

0 0 1 0
5 days ago

Broker policies: stop hardcoding TTL and DLX in your app.

One CLI command applies dead-letter routing to every queue in your cluster. Pattern-matched. Priority-based. No code changes. No restart.

0 0 1 0
5 days ago

Non-voter replicas: the scaling trick nobody talks about.

7 copies of your data for durability, but only 3 vote for consensus. Write latency stays fast. Storage stays safe.

Same idea as ZooKeeper observers.

1 0 2 0
5 days ago

RabbitMQ now has Kafka-like streams.

Append-only log. Multiple consumers. Offset tracking. Message replay.

You might not need that Kafka cluster anymore.

0 0 1 0
5 days ago

Mirrored queues are deprecated. Quorum queues replaced them.

The difference? Raft consensus vs best-effort sync. One guarantees your messages survive node failures. The other hopes for the best.

1 0 1 0