butterfly @butterflysky.dev

TIL if you have a domain in firefox's about:config network.dns.localDomains list, it won't pass A record lookups for that apex domain to your local resolver - it will resolve to loopback instead.

19.09.2025 00:16 — 👍 3 🔁 0 💬 0 📌 0

Thanks, I'll have something up in the next few days. I bet it will help me organize my upcoming projects too.

16.09.2025 03:49 — 👍 0 🔁 0 💬 0 📌 0

Hey thanks! I started down the path of writing it today. Got nerd-sniped figuring out which static site generator and theme to use. Ended up grabbing zola with the serene theme.

I got as far as setting up the github repo and actions, DNS integration, but no content yet. That's a tomorrow adventure!

16.09.2025 03:41 — 👍 2 🔁 0 💬 1 📌 0

I have a lot more storytelling to do here, so many more things to cover, like encrypted root fs on headless systems using clevis/tang to decrypt via the network. ArgoCD and GitOps. Why Cilium (and BGP). Adventures in blowing up and recovering Ceph mons. Etc. Time to start that blog I guess.

15.09.2025 18:59 — 👍 6 🔁 0 💬 1 📌 0

So, that necessitates some digging into management tools and strategies. On docker swarm, I added Portainer, but in the Kubernetes cluster, I'm still trying to decide where I want this to land. I've played with Lens a bit, and it looks useful. We might do that.

15.09.2025 18:57 — 👍 1 🔁 0 💬 1 📌 0

Then we want more user-friendly management. I want you to have access to delve into any of the detail here that you would like to, but I don't want you to _have_ to. I care about developer experience here, and on our home network, I have a customer-base of one, you.

15.09.2025 18:55 — 👍 1 🔁 0 💬 1 📌 0

All of these network services have to be secured, right? I want TLS everywhere, and I want to control the CA. I want it to be easy to use as well, as easy as Let's Encrypt. So, we have step-ca with an ACME provisioner, another service that goes in our inventory.

15.09.2025 18:54 — 👍 1 🔁 0 💬 1 📌 0

Thank you! I'm kind of a butterfly nut. I picked up the nickname butterfly years ago, and now it's on everything I touch.

15.09.2025 18:44 — 👍 0 🔁 0 💬 0 📌 0

Plus, once you've got more than one machine to keep in sync configuration-wise, you need consistent provisioning and config management. Hence netboot, minimal OS install to drop in SSH keys and config, and ansible playbooks after that.

Cluster bootstrapping ephemera? That's on the NAS, via NFS.

15.09.2025 18:42 — 👍 1 🔁 0 💬 1 📌 0

And we need centralized, searchable log collection, metrics, traces. I could even start pushing a lot of stuff into Honeycomb, but again, I want to know what I'm working with before I use a cloud service. We've been flying blind accruing all this stuff piecemeal.

15.09.2025 18:37 — 👍 2 🔁 0 💬 1 📌 0

So, what do we want our network to support?

Fast internet - 10Gbps everywhere
Wireless mesh - Omada
Media - Synology, Plex, Navidrome
VPN - Tailscale
Games - Minecraft, Factorio, etc.
Internal DNS - Unbound, bind
PXE - netboot.xyz
Dev - mysql, postgres, redis, object store
TLS - step-ca, traefik

15.09.2025 18:35 — 👍 1 🔁 0 💬 1 📌 0

The Kubernetes ecosystem is vast. I had analysis paralysis for a while, but I eventually chose k0s for our cluster. It's self-contained, doesn't rely on host-level dependencies, has everything integrated in a single binary. And it's minimal, it doesn't add a bunch of cruft out of the box.

15.09.2025 18:26 — 👍 1 🔁 0 💬 1 📌 0

One day I saw a deal on some NUCs I'd been eyeing, and I picked up 3 Intel and 3 AMD machines. Cluster fodder. Our docker swarm was cobbled together between an older NUC, a gaming PC I built in 2017, and the Synology NAS with its nonstandard Linux and package versions. I wanted something fresh.

15.09.2025 18:22 — 👍 1 🔁 0 💬 1 📌 0

I thought about running some of this stuff in the cloud, but I'm just ADHD enough to screw up and leave some resource running, accruing a bill that comes and bites me later. I didn't want to frontload the accounting aspects of learning Kubernetes, I wanted to play. Why have a homelab otherwise?

15.09.2025 18:19 — 👍 1 🔁 0 💬 1 📌 0

Problem - I knew next to nothing about Kubernetes, and I was pretty intimidated by it. I'd worked with Tupperware at Facebook, but my specialty had me focused on analyzing perf and reliability of DI compute as a holistic distributed system. So I didn't do much with regard to workload orchestration.

15.09.2025 18:14 — 👍 2 🔁 0 💬 1 📌 0

So I wanted a shared storage layer that could tolerate node outages. I did some research and saw Longhorn and Ceph were interesting options. But there wasn't a lot of information about making them work well on Docker swarm. I did find a ton of references to Kubernetes CRDs and Helm charts, though.

15.09.2025 18:10 — 👍 1 🔁 0 💬 1 📌 0

Those services? Now we're pinning them to local storage with node labels. Okay, no HA there.

We also had lock contention, containers hanging, and weird edge cases where synologynas itself would lock up and need power-cycling, couldn't even debug. And we had no observability.

15.09.2025 18:06 — 👍 1 🔁 0 💬 1 📌 0

As it turns out, tuning NFS performance for different workloads requires some care, and some workloads really aren't suited for it at all - for example, rendering new chunks on a Minecraft server while flying around generates a ton of random writes. Rubber-banding with an elytra sucks.

15.09.2025 18:01 — 👍 2 🔁 0 💬 1 📌 0

So at that point, instead of docker compose files we needed stack definitions. Very similar, subtly different. Also, allowing persistent services to float between nodes requires consistent shared storage. Great, let's use the NAS, we'll set up NFS exports on a shared volume. What could go wrong?

15.09.2025 17:56 — 👍 2 🔁 0 💬 1 📌 0

Since we started with Docker containers early on, I initially set up a Docker swarm - now our complexity counter += 1

The swarm gave us the ability to move containers off the NAS to other machines and vice versa. Now we could do offline maintenance without losing all our services.

15.09.2025 17:50 — 👍 3 🔁 0 💬 1 📌 0

Some things we can run on the Synology NAS, and we do - it's hard to beat local storage performance for some demanding I/O workloads. But we can't fit everything I just mentioned into running containers on the NAS. Even if we could, if the NAS went down, do we really want everything else to go down?

15.09.2025 17:46 — 👍 2 🔁 0 💬 1 📌 0

We have various services we want to keep running - some to help organize our media, a wiki, op-connect so we can get easy API access to our 1Password vaults, database, s3-compatible object store, DNS, my Minecraft servers, Omada controller, Tailscale subnet router/exit node. You know, the usual.

15.09.2025 17:45 — 👍 2 🔁 0 💬 1 📌 0

lol I should have led with why I did it. I'll lead with rationale for the cluster.

So... why do we have a kubernetes cluster at home?

15.09.2025 17:40 — 👍 8 🔁 1 💬 2 📌 1

Thanks rain! Right back at ya. <3

And I have had next to no online presence for like 6 years, I'm actually intimidated. I'll probably just focus on sharing my learning explorations, as I don't keep up with most discourse or memes.

15.09.2025 17:35 — 👍 2 🔁 0 💬 1 📌 0

I guess it's time for me to start that blog, then?

15.09.2025 09:21 — 👍 2 🔁 0 💬 1 📌 0

Well, after I got the cluster built, cilium configured, frr installed on opnsense, and peering established between the two, it wasn't too bad. XD

Cilium really did make it pretty straightforward though.

And, yes, there are static routes in place as fallback, just for you. <3

15.09.2025 09:08 — 👍 3 🔁 0 💬 1 📌 1

butterfly

Latest posts by butterflysky.dev on Bluesky

@butterflysky.dev is following 19 prominent accounts