I've built systems for a decade, but a year rebuilding search forced me to refine my approach.
ย
Itโs not that core principles are wrong, it's that their application in a complex, user-facing domain is non-obvious.
5 reframings that were critical to our success. ๐งต
26.10.2025 08:00 โ ๐ 2 ๐ 1 ๐ฌ 1 ๐ 0
Subscribe | Sebastian Sigl
Subscribe to Sebastian Sigl's newsletter and benefit from big tech insights, actionable advice, and an independent viewpoint.
In case you enjoyed this thread, please give it a like and share it with your followers.
In case you want to benefit from even more content, please subscribe to my newsletter:
www.sebastiansigl.com/subscribe
26.10.2025 08:00 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
โ Common Pitfall: Equating technical excellence with product success.
โ
The Principle: A product mindset is the true compass.
The goal is not the most sophisticated system; it's the most effective system for the user.
26.10.2025 08:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
โ Common Pitfall: Rigid functional roles and hand-offs.
โ
The Principle: Blurring lines creates synergy.
Empower your team. Our progress exploded when data scientists could run A/B tests & engineers could explore data.
26.10.2025 08:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
โ Common Pitfall: Chasing offline metrics (nDCG, precision).
โ
The Principle: Business impact is the north star.
If an experiment doesn't move a core KPI (engagement, retention), it's not an improvement.
26.10.2025 08:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
โ Common Pitfall: Engineering for "correctness" from day one.
โ
The Principle: Velocity unlocks correctness.
A fast, end-to-end feedback loop (from user action to A/B test) is the only path to finding what "correct" actually is.
26.10.2025 08:00 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
โ Common Pitfall: Treating search as just an algo/infra problem.
โ
The Principle: It's a Data & Product problem first.
An architecture that learns fast from user signals beats one that just serves fast.
26.10.2025 08:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
I've built systems for a decade, but a year rebuilding search forced me to refine my approach.
ย
Itโs not that core principles are wrong, it's that their application in a complex, user-facing domain is non-obvious.
5 reframings that were critical to our success. ๐งต
26.10.2025 08:00 โ ๐ 2 ๐ 1 ๐ฌ 1 ๐ 0
Indeed, the prompt matters a lot, especially if you prefer a cost efficient model to make it feasible to run on scale.
23.09.2025 05:23 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Your LLM-as-a-Judge system has a secret: It's biased.
And it's silently killing your product while your dashboards are green.
Here are the 5 biases you need to fix *now* if you want to build AI you can trust. ๐งต
#LLM #AI #Evaluation
22.09.2025 13:43 โ ๐ 0 ๐ 1 ๐ฌ 2 ๐ 0
Relying on a biased judge is like flying a plane with a faulty altimeter. You think you're climbing, but you're headed for the ground.
Iโve written a complete guide on how to diagnose and fix these issues, plus build a resilient evaluation system.
22.09.2025 13:43 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
4 & 5/ Authority & Moderation Bias
The Judge is easily fooled.
It falls for fake citations ("Harvard study...") and rewards "safe" refusals that users hate. This erodes trust and makes your product useless.
Fix: Use reference-guided evaluation and mandatory human review for refusal cases.
22.09.2025 13:43 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
3/ Self-Enhancement Bias (aka Nepotism)
The Judge prefers answers from its own model family (e.g., GPT-4 judging GPT-4).
This makes objective cross-model benchmarking impossible.
Fix: Use a neutral, third-party judge model (e.g., use a Google model to judge OpenAI vs. Anthropic).
22.09.2025 13:43 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
2/ Verbosity Bias
The Judge thinks longer = better.
It will reward a 5-paragraph answer over a correct 2-sentence one. This trains your models to be annoying and unhelpful.
Fix: Add "Be concise" and "Penalize verbosity" directly into your judge's rubric.
22.09.2025 13:43 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
1/ Positional Bias
The Judge has a favorite: the first option it sees.
If you A/B test prompts and always put A first, you're not measuring qualityโyou're measuring position.
Fix: Swap the order and run the test again. If the judgment flips, it's invalid. Simple & powerful.
22.09.2025 13:43 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Your LLM-as-a-Judge system has a secret: It's biased.
And it's silently killing your product while your dashboards are green.
Here are the 5 biases you need to fix *now* if you want to build AI you can trust. ๐งต
#LLM #AI #Evaluation
22.09.2025 13:43 โ ๐ 0 ๐ 1 ๐ฌ 2 ๐ 0
Itโs time to look into functional programming with
โGetting Clojure: Build Your Functional Skills One Idea at a Timeโ
by Russ Olsen.
Whatโs your functional programming gem that makes you a better generalist?
#functionalprogramming #clojure #architecture #cleancode
25.08.2025 06:47 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
Next, build an architectural safety net.
Use ๐๐ฒ๐ฝ๐ฒ๐ป๐ฑ๐ฒ๐ป๐ฐ๐ ๐๐ป๐ท๐ฒ๐ฐ๐๐ถ๐ผ๐ป + ๐ฃ๐ฟ๐ผ๐๐ผ๐ฐ๐ผ๐น๐ to define clear, verifiable boundaries.
When mocking, always use ๐๐๐๐๐๐๐๐=๐๐๐๐. A mock that doesn't know the real object's interface is a test that lies.
15.08.2025 07:37 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
The solution starts with a mindset shift: ๐ง๐ฒ๐๐ ๐๐ผ๐ป๐๐ฟ๐ฎ๐ฐ๐๐, ๐ก๐ผ๐ ๐๐บ๐ฝ๐น๐ฒ๐บ๐ฒ๐ป๐๐ฎ๐๐ถ๐ผ๐ป.
Your tests should verify the "what," not the "how." This makes them resilient to the constant refactoring that AI-assisted development encourages.
15.08.2025 07:37 โ ๐ 2 ๐ 1 ๐ฌ 1 ๐ 0
The problem: AI optimizes for the immediate task. It doesn't have architectural awareness. It will happily duplicate code rather than refactor a shared module.
This creates brittle, high-churn code that slows down the entire team in the long run.
15.08.2025 07:37 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
The AI Productivity Paradox is real.
Devs are 55% faster with tools like Copilot. But new 2025 research shows code duplication is up 8x and refactoring is down 40%.
We're shipping faster, but are we building legacy code from day one?
A thread on how to fix it. ๐งต
15.08.2025 07:37 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Books that make you a better generalist:
- XP & TDD by @kentbeck.com
- Clean Code by R. Martin
- Refactoring by @martinfowler.com
- DDD by @vaughnvernon.bsky.social
- Crucial Conversations by J. Grenny
- Thinking in Systems by D. Meadows
- Staff Engineer by W. Larson
And many moreโฆ
03.08.2025 17:31 โ ๐ 5 ๐ 0 ๐ฌ 0 ๐ 0
I me er liked iTunes very much. It always felt heavy. Remember Winamp? ๐ Good old days.
My family and I have gotten really used to Spotify. Itโs the kind of โjust worksโ solution that integrates nicely with various smart systemsโcar, voice assistant, phone, and other devices.
28.07.2025 05:42 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Really interesting ๐ง
I believe most engineers still donโt fully understand how to use AI. Tools are evolving fast, and thereโs a bias in some groupsโlike โI should use AIโ even when itโs not helpful. Only a few can currently judge when and how to apply it effectively to actually move faster.
22.07.2025 03:35 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
#CSS ex-Google Chrome DevRel, CSSWG, co-host The CSS Podcast, host @ GUI Challenges, co-host Bad@CSS Podcast.
Creator of VisBug, open-props.style, gradient.style, transition.style & more
UI, UX, CSS, HTML, JS
https://nerdy.dev
๐ฉโ๐ป Sr Dir of DevRel at @GitHub.
๐ฅ Grandmother of the first-ever hellthread.
๐จ๏ธ Will probably convince you to buy a 3D printer.
Software Engineer & Tech Lead, former @cern. ๐๏ธ๐ฐThe Optimist Engineer https://newsletter.optimistengineer.com. Science is the answer.
post-normal person
harper.lol / reading.lol / photos.lol / harper.blog
@harper on the other place
I'm on Germ DM ๐
https://ger.mx/A9bcnkcEv8ggK1BQeSIEBw3rJ6v1tZsJjeN1tA5NA7CU#did:plc:n6com3b6tkpq76vr5n7xqutu
Writing The Pragmatic Engineer (@pragmaticengineer.com), the #1 technology newsletter on Substack. Author of The Software Engineer's Guidebook (engguidebook.com). Formerly at Uber, Skype, Skyscanner. More at pragmaticengineer.com
Write at lethain.com. Author of An Elegant Puzzle, Staff Engineer, and An Engineering Executiveโs Primer. Worked some places.
programming and exclamation marks
blog: jvns.ca
zines: wizardzines.com
Cofounder, @AdaptiveCLabs, โthe NTSB of Techโ bringing Resilience Engineering to industry. he/him. Wonโt speak on all-male panels, and #blacklivesmatter.
cofounder/CTO @honeycombio, co-author of Observability Engineering and Database Reliability Engineering. I test in production and so do you. ๐๐ณ๏ธโ๐๐ฆ
Organizational psychologist @Wharton. #1 NYT bestseller: THINK AGAIN. Podcasts: Re:Thinking & WorkLife. Diver. Arguing like Iโm right, listening like Iโm wrong.
Protect, promote and advance the development of the Kotlin programming language. Learn more at kotlinfoundation.org
Technical Agile Coach in Paris.
โก Event Sourcerer
โ๏ธ Blogger at http://event-driven.io
๐จโ๐ป Open Sourcing at https://github.com/oskardudycz
๐ง Newsletter: http://architecture-weekly.com
Husband, dad, enjoys working distributed, likes distributed databases & search engines, the JVM, Basketball/Streetball fan, gulps coffee, lives in Emsdetten/Germany, occasionally blogs at https://spinscale.de
Jack of all trades, master of none: Tester, Developer, Powerlifter, Nutritionist, Writer. I do too many things, and I can't help it.
author - architect - consultant - software developer - speaker - trainer
Nuremberg Area (Germany)
Software Architect, Passionate Developer, Java/Web Expert, Gamer, VR Enthusiast, Sci-Fi/Fantasy Fan, Photography Lover, Smooth Jazz Piano Player, Husband & Dad
Creator c4model.com & structurizr.com | Author "Software Architecture for Developers" | Software architecture and diagramming workshops worldwide | Patreon at patreon.com/simon_brown
Weโre an IT consultancy providing software architecture consulting, development, and training services. Based in ๐ฉ๐ช and ๐จ๐ญ. Find us on Mastodon, LinkedIn, Threads.
Independent I guess these days you'd say content producer, artist, programmer, musician, pokerist. More about me at https://KentBeck.com.