Sebastian Sigl @sesigl - Bluesky Profile

I've built systems for a decade, but a year rebuilding search forced me to refine my approach.

It’s not that core principles are wrong, it's that their application in a complex, user-facing domain is non-obvious.

5 reframings that were critical to our success. 🧵

26.10.2025 08:00 — 👍 2 🔁 1 💬 1 📌 0

Subscribe | Sebastian Sigl Subscribe to Sebastian Sigl's newsletter and benefit from big tech insights, actionable advice, and an independent viewpoint.

In case you enjoyed this thread, please give it a like and share it with your followers.

In case you want to benefit from even more content, please subscribe to my newsletter:

www.sebastiansigl.com/subscribe

26.10.2025 08:00 — 👍 0 🔁 0 💬 0 📌 0

After a Year Rebuilding Search, I Had to Rethink Everything | Sebastian Sigl A seasoned engineer's lessons from a year rebuilding a search system from the ground up, shifting from engineering-first to product-first thinking.

These principles are not new. But their application in the messy reality of production search was a powerful lesson.

I share the full story here:

www.sebastiansigl.com/blog/rebuild...

26.10.2025 08:00 — 👍 1 🔁 1 💬 1 📌 0

❌ Common Pitfall: Equating technical excellence with product success.

✅ The Principle: A product mindset is the true compass.

The goal is not the most sophisticated system; it's the most effective system for the user.

26.10.2025 08:00 — 👍 0 🔁 0 💬 1 📌 0

❌ Common Pitfall: Rigid functional roles and hand-offs.

✅ The Principle: Blurring lines creates synergy.

Empower your team. Our progress exploded when data scientists could run A/B tests & engineers could explore data.

26.10.2025 08:00 — 👍 0 🔁 0 💬 1 📌 0

❌ Common Pitfall: Chasing offline metrics (nDCG, precision).

✅ The Principle: Business impact is the north star.

If an experiment doesn't move a core KPI (engagement, retention), it's not an improvement.

26.10.2025 08:00 — 👍 0 🔁 0 💬 1 📌 0

❌ Common Pitfall: Engineering for "correctness" from day one.

✅ The Principle: Velocity unlocks correctness.

A fast, end-to-end feedback loop (from user action to A/B test) is the only path to finding what "correct" actually is.

26.10.2025 08:00 — 👍 1 🔁 0 💬 1 📌 0

❌ Common Pitfall: Treating search as just an algo/infra problem.

✅ The Principle: It's a Data & Product problem first.

An architecture that learns fast from user signals beats one that just serves fast.

26.10.2025 08:00 — 👍 0 🔁 0 💬 1 📌 0

I've built systems for a decade, but a year rebuilding search forced me to refine my approach.

It’s not that core principles are wrong, it's that their application in a complex, user-facing domain is non-obvious.

5 reframings that were critical to our success. 🧵

26.10.2025 08:00 — 👍 2 🔁 1 💬 1 📌 0

Indeed, the prompt matters a lot, especially if you prefer a cost efficient model to make it feasible to run on scale.

23.09.2025 05:23 — 👍 1 🔁 0 💬 0 📌 0

Your LLM-as-a-Judge system has a secret: It's biased.

And it's silently killing your product while your dashboards are green.

Here are the 5 biases you need to fix *now* if you want to build AI you can trust. 🧵

#LLM #AI #Evaluation

22.09.2025 13:43 — 👍 0 🔁 1 💬 2 📌 0

Subscribe | Sebastian Sigl Subscribe to Sebastian Sigl's newsletter and benefit from big tech insights, actionable advice, and an independent viewpoint.

And if you find this useful, subscribe to my newsletter for more deep dives like this every week:

www.sebastiansigl.com/subscribe

22.09.2025 13:43 — 👍 0 🔁 0 💬 0 📌 0

The 5 Biases That Can Silently Kill Your LLM Evaluations (And How to Fix Them) | Sebastian Sigl Your LLM-as-a-Judge system might be lying to you. This post uncovers 5 critical biases like positional, verbosity, and moderation bias that silently corrupt your AI evaluations, leading to poor produc...

Read the full, in-depth playbook on my blog. No fluff, just actionable advice.

www.sebastiansigl.com/blog/llm-jud...

22.09.2025 13:43 — 👍 0 🔁 0 💬 1 📌 0

Relying on a biased judge is like flying a plane with a faulty altimeter. You think you're climbing, but you're headed for the ground.

I’ve written a complete guide on how to diagnose and fix these issues, plus build a resilient evaluation system.

22.09.2025 13:43 — 👍 0 🔁 0 💬 1 📌 0

4 & 5/ Authority & Moderation Bias

The Judge is easily fooled.

It falls for fake citations ("Harvard study...") and rewards "safe" refusals that users hate. This erodes trust and makes your product useless.

Fix: Use reference-guided evaluation and mandatory human review for refusal cases.

22.09.2025 13:43 — 👍 0 🔁 0 💬 1 📌 0

3/ Self-Enhancement Bias (aka Nepotism)

The Judge prefers answers from its own model family (e.g., GPT-4 judging GPT-4).

This makes objective cross-model benchmarking impossible.

Fix: Use a neutral, third-party judge model (e.g., use a Google model to judge OpenAI vs. Anthropic).

22.09.2025 13:43 — 👍 0 🔁 0 💬 1 📌 0

2/ Verbosity Bias

The Judge thinks longer = better.

It will reward a 5-paragraph answer over a correct 2-sentence one. This trains your models to be annoying and unhelpful.

Fix: Add "Be concise" and "Penalize verbosity" directly into your judge's rubric.

22.09.2025 13:43 — 👍 0 🔁 0 💬 1 📌 0

1/ Positional Bias

The Judge has a favorite: the first option it sees.

If you A/B test prompts and always put A first, you're not measuring quality—you're measuring position.

Fix: Swap the order and run the test again. If the judgment flips, it's invalid. Simple & powerful.

22.09.2025 13:43 — 👍 0 🔁 0 💬 1 📌 0

Your LLM-as-a-Judge system has a secret: It's biased.

And it's silently killing your product while your dashboards are green.

Here are the 5 biases you need to fix *now* if you want to build AI you can trust. 🧵

#LLM #AI #Evaluation

22.09.2025 13:43 — 👍 0 🔁 1 💬 2 📌 0

It’s time to look into functional programming with

„Getting Clojure: Build Your Functional Skills One Idea at a Time“
by Russ Olsen.

What’s your functional programming gem that makes you a better generalist?

#functionalprogramming #clojure #architecture #cleancode

25.08.2025 06:47 — 👍 3 🔁 0 💬 0 📌 0

Augmented Coding, Amplified Risk: Why Type-Safe Python Tests Matter More Than Ever | Sebastian Sigl AI coding assistants are accelerating development—but also magnifying quality risks. Here’s how to write Python tests that survive refactors, scale with your codebase, and tame the chaos of augmented ...

I've written a full guide with code examples and the 4 core principles for writing AI-ready Python tests.
It's the playbook for harnessing AI speed without sacrificing quality.

Read it here: www.sebastiansigl.com/blog/type-sa...

#Python #Testing #AI #SoftwareQuality

15.08.2025 07:37 — 👍 3 🔁 3 💬 1 📌 0

Subscribe | Sebastian Sigl Subscribe to Sebastian Sigl's newsletter and benefit from big tech insights, actionable advice, and an independent viewpoint.

If you enjoyed this thread and want more high-quality content, subscribe to my newsletter here:

www.sebastiansigl.com/subscribe

15.08.2025 07:37 — 👍 0 🔁 0 💬 0 📌 0

Augmented Coding, Amplified Risk: Why Type-Safe Python Tests Matter More Than Ever | Sebastian Sigl AI coding assistants are accelerating development—but also magnifying quality risks. Here’s how to write Python tests that survive refactors, scale with your codebase, and tame the chaos of augmented ...

I've written a full guide with code examples and the 4 core principles for writing AI-ready Python tests.
It's the playbook for harnessing AI speed without sacrificing quality.

Read it here: www.sebastiansigl.com/blog/type-sa...

#Python #Testing #AI #SoftwareQuality

15.08.2025 07:37 — 👍 3 🔁 3 💬 1 📌 0

Next, build an architectural safety net.

Use 𝗗𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝗰𝘆 𝗜𝗻𝗷𝗲𝗰𝘁𝗶𝗼𝗻 + 𝗣𝗿𝗼𝘁𝗼𝗰𝗼𝗹𝘀 to define clear, verifiable boundaries.

When mocking, always use 𝚊𝚞𝚝𝚘𝚜𝚙𝚎𝚌=𝚃𝚛𝚞𝚎. A mock that doesn't know the real object's interface is a test that lies.

15.08.2025 07:37 — 👍 0 🔁 0 💬 1 📌 0

The solution starts with a mindset shift: 𝗧𝗲𝘀𝘁 𝗖𝗼𝗻𝘁𝗿𝗮𝗰𝘁𝘀, 𝗡𝗼𝘁 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻.

Your tests should verify the "what," not the "how." This makes them resilient to the constant refactoring that AI-assisted development encourages.

15.08.2025 07:37 — 👍 2 🔁 1 💬 1 📌 0

The problem: AI optimizes for the immediate task. It doesn't have architectural awareness. It will happily duplicate code rather than refactor a shared module.

This creates brittle, high-churn code that slows down the entire team in the long run.

15.08.2025 07:37 — 👍 0 🔁 0 💬 1 📌 0

The AI Productivity Paradox is real.

Devs are 55% faster with tools like Copilot. But new 2025 research shows code duplication is up 8x and refactoring is down 40%.

We're shipping faster, but are we building legacy code from day one?

A thread on how to fix it. 🧵

15.08.2025 07:37 — 👍 0 🔁 0 💬 1 📌 0

Books that make you a better generalist:

- XP & TDD by @kentbeck.com
- Clean Code by R. Martin
- Refactoring by @martinfowler.com
- DDD by @vaughnvernon.bsky.social
- Crucial Conversations by J. Grenny
- Thinking in Systems by D. Meadows
- Staff Engineer by W. Larson

And many more…

03.08.2025 17:31 — 👍 5 🔁 0 💬 0 📌 0

I me er liked iTunes very much. It always felt heavy. Remember Winamp? 🙂 Good old days.

My family and I have gotten really used to Spotify. It’s the kind of “just works” solution that integrates nicely with various smart systems—car, voice assistant, phone, and other devices.

28.07.2025 05:42 — 👍 0 🔁 0 💬 0 📌 0

Really interesting 🧐

I believe most engineers still don’t fully understand how to use AI. Tools are evolving fast, and there’s a bias in some groups—like “I should use AI” even when it’s not helpful. Only a few can currently judge when and how to apply it effectively to actually move faster.

22.07.2025 03:35 — 👍 0 🔁 0 💬 0 📌 0

Sebastian Sigl

Latest posts by sesigl.bsky.social on Bluesky

@sesigl is following 20 prominent accounts