Sebastian Sigl's Avatar

Sebastian Sigl

@sesigl.bsky.social

12 Followers  |  24 Following  |  56 Posts  |  Joined: 22.04.2025  |  2.253

Latest posts by sesigl.bsky.social on Bluesky

Post image

I've built systems for a decade, but a year rebuilding search forced me to refine my approach.
ย 
Itโ€™s not that core principles are wrong, it's that their application in a complex, user-facing domain is non-obvious.

5 reframings that were critical to our success. ๐Ÿงต

26.10.2025 08:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Subscribe | Sebastian Sigl Subscribe to Sebastian Sigl's newsletter and benefit from big tech insights, actionable advice, and an independent viewpoint.

In case you enjoyed this thread, please give it a like and share it with your followers.

In case you want to benefit from even more content, please subscribe to my newsletter:

www.sebastiansigl.com/subscribe

26.10.2025 08:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
After a Year Rebuilding Search, I Had to Rethink Everything | Sebastian Sigl A seasoned engineer's lessons from a year rebuilding a search system from the ground up, shifting from engineering-first to product-first thinking.

These principles are not new. But their application in the messy reality of production search was a powerful lesson.

I share the full story here:

www.sebastiansigl.com/blog/rebuild...

26.10.2025 08:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

โŒ Common Pitfall: Equating technical excellence with product success.

โœ… The Principle: A product mindset is the true compass.

The goal is not the most sophisticated system; it's the most effective system for the user.

26.10.2025 08:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

โŒ Common Pitfall: Rigid functional roles and hand-offs.

โœ… The Principle: Blurring lines creates synergy.

Empower your team. Our progress exploded when data scientists could run A/B tests & engineers could explore data.

26.10.2025 08:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

โŒ Common Pitfall: Chasing offline metrics (nDCG, precision).

โœ… The Principle: Business impact is the north star.

If an experiment doesn't move a core KPI (engagement, retention), it's not an improvement.

26.10.2025 08:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

โŒ Common Pitfall: Engineering for "correctness" from day one.

โœ… The Principle: Velocity unlocks correctness.

A fast, end-to-end feedback loop (from user action to A/B test) is the only path to finding what "correct" actually is.

26.10.2025 08:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

โŒ Common Pitfall: Treating search as just an algo/infra problem.

โœ… The Principle: It's a Data & Product problem first.

An architecture that learns fast from user signals beats one that just serves fast.

26.10.2025 08:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

I've built systems for a decade, but a year rebuilding search forced me to refine my approach.
ย 
Itโ€™s not that core principles are wrong, it's that their application in a complex, user-facing domain is non-obvious.

5 reframings that were critical to our success. ๐Ÿงต

26.10.2025 08:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Indeed, the prompt matters a lot, especially if you prefer a cost efficient model to make it feasible to run on scale.

23.09.2025 05:23 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Your LLM-as-a-Judge system has a secret: It's biased.

And it's silently killing your product while your dashboards are green.

Here are the 5 biases you need to fix *now* if you want to build AI you can trust. ๐Ÿงต

#LLM #AI #Evaluation

22.09.2025 13:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
Subscribe | Sebastian Sigl Subscribe to Sebastian Sigl's newsletter and benefit from big tech insights, actionable advice, and an independent viewpoint.

And if you find this useful, subscribe to my newsletter for more deep dives like this every week:

www.sebastiansigl.com/subscribe

22.09.2025 13:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
The 5 Biases That Can Silently Kill Your LLM Evaluations (And How to Fix Them) | Sebastian Sigl Your LLM-as-a-Judge system might be lying to you. This post uncovers 5 critical biases like positional, verbosity, and moderation bias that silently corrupt your AI evaluations, leading to poor produc...

Read the full, in-depth playbook on my blog. No fluff, just actionable advice.

www.sebastiansigl.com/blog/llm-jud...

22.09.2025 13:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Relying on a biased judge is like flying a plane with a faulty altimeter. You think you're climbing, but you're headed for the ground.

Iโ€™ve written a complete guide on how to diagnose and fix these issues, plus build a resilient evaluation system.

22.09.2025 13:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

4 & 5/ Authority & Moderation Bias

The Judge is easily fooled.

It falls for fake citations ("Harvard study...") and rewards "safe" refusals that users hate. This erodes trust and makes your product useless.

Fix: Use reference-guided evaluation and mandatory human review for refusal cases.

22.09.2025 13:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

3/ Self-Enhancement Bias (aka Nepotism)

The Judge prefers answers from its own model family (e.g., GPT-4 judging GPT-4).

This makes objective cross-model benchmarking impossible.

Fix: Use a neutral, third-party judge model (e.g., use a Google model to judge OpenAI vs. Anthropic).

22.09.2025 13:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

2/ Verbosity Bias

The Judge thinks longer = better.

It will reward a 5-paragraph answer over a correct 2-sentence one. This trains your models to be annoying and unhelpful.

Fix: Add "Be concise" and "Penalize verbosity" directly into your judge's rubric.

22.09.2025 13:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

1/ Positional Bias

The Judge has a favorite: the first option it sees.

If you A/B test prompts and always put A first, you're not measuring qualityโ€”you're measuring position.

Fix: Swap the order and run the test again. If the judgment flips, it's invalid. Simple & powerful.

22.09.2025 13:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Your LLM-as-a-Judge system has a secret: It's biased.

And it's silently killing your product while your dashboards are green.

Here are the 5 biases you need to fix *now* if you want to build AI you can trust. ๐Ÿงต

#LLM #AI #Evaluation

22.09.2025 13:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

Itโ€™s time to look into functional programming with

โ€žGetting Clojure: Build Your Functional Skills One Idea at a Timeโ€œ
by Russ Olsen.

Whatโ€™s your functional programming gem that makes you a better generalist?

#functionalprogramming #clojure #architecture #cleancode

25.08.2025 06:47 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Augmented Coding, Amplified Risk: Why Type-Safe Python Tests Matter More Than Ever | Sebastian Sigl AI coding assistants are accelerating developmentโ€”but also magnifying quality risks. Hereโ€™s how to write Python tests that survive refactors, scale with your codebase, and tame the chaos of augmented ...

I've written a full guide with code examples and the 4 core principles for writing AI-ready Python tests.
It's the playbook for harnessing AI speed without sacrificing quality.

Read it here: www.sebastiansigl.com/blog/type-sa...

#Python #Testing #AI #SoftwareQuality

15.08.2025 07:37 โ€” ๐Ÿ‘ 3    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Subscribe | Sebastian Sigl Subscribe to Sebastian Sigl's newsletter and benefit from big tech insights, actionable advice, and an independent viewpoint.

If you enjoyed this thread and want more high-quality content, subscribe to my newsletter here:

www.sebastiansigl.com/subscribe

15.08.2025 07:37 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Augmented Coding, Amplified Risk: Why Type-Safe Python Tests Matter More Than Ever | Sebastian Sigl AI coding assistants are accelerating developmentโ€”but also magnifying quality risks. Hereโ€™s how to write Python tests that survive refactors, scale with your codebase, and tame the chaos of augmented ...

I've written a full guide with code examples and the 4 core principles for writing AI-ready Python tests.
It's the playbook for harnessing AI speed without sacrificing quality.

Read it here: www.sebastiansigl.com/blog/type-sa...

#Python #Testing #AI #SoftwareQuality

15.08.2025 07:37 โ€” ๐Ÿ‘ 3    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Next, build an architectural safety net.

Use ๐——๐—ฒ๐—ฝ๐—ฒ๐—ป๐—ฑ๐—ฒ๐—ป๐—ฐ๐˜† ๐—œ๐—ป๐—ท๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป + ๐—ฃ๐—ฟ๐—ผ๐˜๐—ผ๐—ฐ๐—ผ๐—น๐˜€ to define clear, verifiable boundaries.

When mocking, always use ๐šŠ๐šž๐š๐š˜๐šœ๐š™๐šŽ๐šŒ=๐šƒ๐š›๐šž๐šŽ. A mock that doesn't know the real object's interface is a test that lies.

15.08.2025 07:37 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The solution starts with a mindset shift: ๐—ง๐—ฒ๐˜€๐˜ ๐—–๐—ผ๐—ป๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐˜€, ๐—ก๐—ผ๐˜ ๐—œ๐—บ๐—ฝ๐—น๐—ฒ๐—บ๐—ฒ๐—ป๐˜๐—ฎ๐˜๐—ถ๐—ผ๐—ป.

Your tests should verify the "what," not the "how." This makes them resilient to the constant refactoring that AI-assisted development encourages.

15.08.2025 07:37 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The problem: AI optimizes for the immediate task. It doesn't have architectural awareness. It will happily duplicate code rather than refactor a shared module.

This creates brittle, high-churn code that slows down the entire team in the long run.

15.08.2025 07:37 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

The AI Productivity Paradox is real.

Devs are 55% faster with tools like Copilot. But new 2025 research shows code duplication is up 8x and refactoring is down 40%.

We're shipping faster, but are we building legacy code from day one?

A thread on how to fix it. ๐Ÿงต

15.08.2025 07:37 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Books that make you a better generalist:

- XP & TDD by @kentbeck.com
- Clean Code by R. Martin
- Refactoring by @martinfowler.com
- DDD by @vaughnvernon.bsky.social
- Crucial Conversations by J. Grenny
- Thinking in Systems by D. Meadows
- Staff Engineer by W. Larson

And many moreโ€ฆ

03.08.2025 17:31 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I me er liked iTunes very much. It always felt heavy. Remember Winamp? ๐Ÿ™‚ Good old days.

My family and I have gotten really used to Spotify. Itโ€™s the kind of โ€œjust worksโ€ solution that integrates nicely with various smart systemsโ€”car, voice assistant, phone, and other devices.

28.07.2025 05:42 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Really interesting ๐Ÿง

I believe most engineers still donโ€™t fully understand how to use AI. Tools are evolving fast, and thereโ€™s a bias in some groupsโ€”like โ€œI should use AIโ€ even when itโ€™s not helpful. Only a few can currently judge when and how to apply it effectively to actually move faster.

22.07.2025 03:35 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@sesigl is following 20 prominent accounts