Hamel Husain's Avatar

Hamel Husain

@hamel.bsky.social

evals evals evals. https://evals.info

6,429 Followers  |  656 Following  |  126 Posts  |  Joined: 07.02.2023  |  2.1334

Latest posts by hamel.bsky.social on Bluesky

Preview
Evals for AI Engineers Stop using guesswork to find out how your AI applications are performing. Evals for AI Engineers equips you with the proven tools and processes required to systematically test,... - Selection from Eva...

Relevant links

- Our course: evals.info

- Early release ( just has the TOC & intro now learning.oreilly.com/library/view...

08.11.2025 17:05 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸ‘€ Animals have been assigned.

Scheduled to print fall 2026!

We have iterated on this with over 3k students (and continue to do so). We give our students access to the full draft as part of our evals course (link in bio)

08.11.2025 17:03 β€” πŸ‘ 24    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Hi

05.11.2025 00:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Love that @hamel.bsky.social is putting on a hackathon where the goal is for your agent to score the highest on evaluations, not just do something flashy.

click.convertkit-mail2.com/gkumlz753lc5...

31.10.2025 00:07 β€” πŸ‘ 3    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Preview
Building resilient prompts using an evaluation flywheel | OpenAI Cookbook This cookbook provides a practical guide on how to use the OpenAI Platform to easily build resilience into your prompts. A resilient prom...

If your looking to get started with evals check out this cookbook from @hamel.bsky.social
cookbook.openai.com/examples/eva...

06.10.2025 20:34 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

"Can I just get an LLM to do my error analysis?"

We get this question constantly. The answer is no, and trying is the fastest way to miss critical bugs.

Full podcast: youtu.be/BsWxPI9UM4c?...

03.10.2025 21:12 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar
Hamel Husain and Shreya Shankar teach the world’s most popular course on AI evals and have trained over 2,000 PMs and engineers (including many teams at OpenAI and Anthropic). In this conversation,… Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

Full podcast. youtu.be/BsWxPI9UM4c?...

We are teaching our final course of the year on evals starting this Monday. You can enroll with this link to get 35% off: maven.com/parlance-lab...

03.10.2025 18:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

I recently sat down with Lenny Rachitsky to discuss why AI Evals are becoming the most sought after skill for product builders.

As a bonus, we step through an end-to-end example of building an eval in a spreadsheet so everyone can understand. See reply for links.

03.10.2025 18:07 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I think it’s more like 99% 🀣 the 1% worked super hard on data modeling

24.09.2025 23:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
You Don't Need a Graph DB Many teams adopt graph databases believing they need specialized tools for relationship data, adding unnecessary complexity to their stack. This session reveals that for most use cases, the…

This one is going to be spicy. 80% of the time I've seen a graph DB in production, it's been an overcomplicated mess (especially in AI applications).

In this talk, Jo and I will discuss when GraphDBs are overkill and when they actually make sense.

Sign up here:

24.09.2025 17:09 β€” πŸ‘ 15    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For technical domains especially, getting non-DSes involved in analyzing outputs is vital. It’s hard to build anything good without it bc v1s almost always have major fail modes. Finding the appropriate system designβ€”let alone optimizingβ€”requires a tight coupling of output analysis and system design

26.08.2025 04:55 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Can you screenshot it and tell me what it is so I can troubleshoot it

23.08.2025 16:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
AI Evals Email Course A free 17-part email series on the principles of application-centric LLM evals.

If you're wanted to learn applied AI evals but not sure if its for you, @sh-reya.bsky.social and I put together something that might help.

This free email course compiles what we've learned from teaching 2k+ students. It’s 17 emails plus 2 free e-books.

Here's the link: ai.hamel.dev/eval-course

23.08.2025 04:41 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

I've been eval pilled by @hamel.bsky.social . Everyone is all let's build some MCP servers and ship a minimal AISRE as quickly as possible. And I'm writing a design doc about how to build evals with @runme.dev so we can iterate rapidly on the AI

08.08.2025 14:55 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

Last chance to signup for this free lesson with OpenAI on evals, Including a sneak peek of their new eval products!

Link: maven.com/p/d2dc30/how...

08.06.2025 14:44 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
AI Evals For Engineers: Book Review YouTube video by Hamel Husain

Link to full talk youtube.com/live/jJMYWfQ...

17.05.2025 00:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
AI Evals For Engineers & PMs by Hamel Husain and Shreya Shankar on Maven Learn proven approaches for quickly improving AI applications. Build AI that works better than the competition, regardless of the use-case.

We'll be discussing this in our upcoming course on May 19th - AI Evals For Engineers & PMs (This link has a 35% discount code):

maven.com/parlance-lab...

17.05.2025 00:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Can non-data scientists write AI Evals? The answer is nuanced and not just "Yes". @eugeneyan.com and I discuss this in the context of the "analyze-measure-improve" cycle from our course.

Links to more resources in the reply

17.05.2025 00:18 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
Post image

If you are writing evals without error analysis, our course AI Evals for Engineers & PMs is for you. Begins monday next week. Full syllabus in this link: maven.com/parlance-lab...

12.05.2025 05:17 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
LLM Evals: Common Mistakes
YouTube video by Hamel Husain LLM Evals: Common Mistakes

It is very easy to make mistakes when creating evals for your AI product. @sh-reya.bsky.social and I run through the most common errors in this talk.

35% discount code to our upcoming course in the video notes

youtu.be/GL0XhAj5LPE?...

08.05.2025 15:23 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

I thought this was a meme 🀣 … but it’s real

01.05.2025 15:12 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

GitHub CoPilot is one of the first commercially successful LLM products (predating ChatGPT). What was the secret? A robust eval suite!

In this lightning lesson, John Berryman will reveal the eval techniques (and mistakes) from working on this product

maven.com/p/da8264/how...

29.04.2025 05:27 β€” πŸ‘ 25    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0
Effective Evals for AI products

Effective Evals for AI products

@hamel.bsky.social & @sh-reya.bsky.social are two of the world's best on evals. They've built evals for 35+ AI apps & helped teams ship confidently. Now they'll teach everything they know on building evals that work.

Enrollment closes in 4 days.

Secret 35% discount code: maven.com/parlance-lab...

30.04.2025 02:56 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

GitHub CoPilot is one of the first commercially successful LLM products (predating ChatGPT). What was the secret? A robust eval suite!

In this lightning lesson, John Berryman will reveal the eval techniques (and mistakes) from working on this product

maven.com/p/da8264/how...

29.04.2025 05:27 β€” πŸ‘ 25    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0
Post image

I keep hearing about the emerging role of AI PM. How is this any different than a normal PM? Is it hype? We are gonna find out in this free lightning lesson. I will ask difficult questions. With @schof.bsky.social and Aman Khan

maven.com/p/544677/wha...

27.04.2025 18:17 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

I keep hearing about the emerging role of AI PM. How is this any different than a normal PM? Is it hype? We are gonna find out in this free lightning lesson. I will ask difficult questions. With @schof.bsky.social and Aman Khan

maven.com/p/544677/wha...

27.04.2025 18:17 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
AI Evals For Engineers & PMs by Hamel Husain and Shreya Shankar on Maven Learn proven approaches for quickly improving AI applications. Build AI that works better than the competition, regardless of the use-case.

As genAI projects mature, proper evals are becoming table stakes for production deployment. But how do we evaluate probabilistic machines?

Looking forward to learning about the latest techniques and best practices from @hamel.bsky.social and Shreya Shankar next month!

26.04.2025 17:19 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Last chance to sign up for this. Recording sent to everyone who signs up. maven.com/p/29a33a/hyb...

16.04.2025 12:59 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@hamel.bsky.social & his wisdom on evals, error analysis, looking at your data is what we need. Here are his 10 Don'ts:
β€’ Don't skip error analysis
β€’ Don't skip looking at your data
β€’ Don't gatekeep who can write prompts
β€’ Don't let zero users be a roadblock
β€’ Don't be blindsided by criteria drift

16.04.2025 01:05 β€” πŸ‘ 15    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image

If you are building RAG applications, you don't want to miss this. Doug Turnbull is going to show you his tricks he's learned from a decade of optimizing retrieval in search systems, and how that transfers to RAG.

Link: maven.com/p/29a33a/hyb...

14.04.2025 16:44 β€” πŸ‘ 19    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

@hamel is following 20 prominent accounts