This is why going forward all AI features I help build will be natively instrumented with #OTEL. The telemetry data is the "fossil fuel" that feeds understanding and future improvement. AI cannot be treated as a black-box. It has to be inspected and understood.
12.07.2025 17:58 β π 0 π 0 π¬ 0 π 0
Telemetry while testing and developing has been critical for me. It let's me hook into and inspect how systems like Vercel's AI SDK and LiteLLM work under the hood and figure out what prompts are being used for judgement.
12.07.2025 17:58 β π 0 π 0 π¬ 1 π 0
Take evals. You might pick an eval and trust that it works. But this would be a mistake. It's rare that these evals will work for you across the board. Previously it would have been crazy to enable telemetry during testing. But with evals, you are going to want to inspect how your tests "operate".
12.07.2025 17:57 β π 0 π 0 π¬ 1 π 0
Tracing and telemetry traditionally has been an operational requirement, not a development one. But I've found that with AI applications this fundamentally changes.
12.07.2025 17:55 β π 1 π 0 π¬ 1 π 0
π³
@arize.bsky.social OSS Prompt Playground
@arize-phoenix.bsky.social gets Deepseek support! Now you can compare outputs of all the top tier reasoning models.
Which LLM provider would you like to see next? Let us know on GitHub!
github.com/Arize-ai/pho...
29.05.2025 15:09 β π 0 π 1 π¬ 0 π 0
π¨βπ³ @arize-phoenix.bsky.social continues to cook
Announcing OpenInference instrumentation for Agno, Mastra, Bedrock Agents, and AutoGen AgentChat!
At @arize.bsky.social we believe observability deserves to be built in the open
s/o @anthonypowell.me and many others
github.com/Arize-ai/ope...
23.05.2025 04:14 β π 1 π 0 π¬ 0 π 0
@arizeai/phoenix-client@1.3.0 -
@arize-phoenix.bsky.social javascript client gets experiments π§ͺ
s/o @anthonypowell.me !
- native tracing of ai tasks and evaluators,
- async concurrency queues
- support for any evaluator (e.g. bring your own evals) and more!
14.05.2025 22:47 β π 2 π 1 π¬ 0 π 0
Client Challenge
OpenTelemetry instrumentation for Agno is published! Huge s/o to Dirk Brand.
A true testament that AI observability should be built in the open π
@arize-phoenix.bsky.social
pypi.org/project/open...
12.05.2025 14:23 β π 0 π 0 π¬ 0 π 0
annotating an llm call
πAnnotation Configs in @arize-phoenix.bsky.social
Part of the "Look at the Data" initiative, create custom rubrics and forms to annotate your spans.
s/o to @anthonypowell.me here who built out all the rich UI features.
09.05.2025 18:28 β π 1 π 1 π¬ 0 π 0
9β£ @arize-phoenix.bsky.social is gonna turn 9 today.
Project Retention Policies
Customize the data retention of your projects by number of days or by trace count. No more cron jobs or manual deleting of traces needed!
A much requested ask from our on-prem users and phoenix-cloud users alike.
09.05.2025 17:39 β π 0 π 0 π¬ 0 π 0
Learn to prompt better
07.05.2025 19:26 β π 6 π 5 π¬ 0 π 0
A speaker announcement card showing that Ben McHone is going to be presenting at Arize: Observe 2025 on June 25th, 2025.
I'll be speaking at Arize:Observe at SHACK15 on June 25! Looking forward to exploring whatβs next for AI agents & assistants. More details on my session to come. @arize.bsky.social
arize.com/observe-2025
14.04.2025 15:01 β π 3 π 2 π¬ 0 π 0
I still own plenty of pencils but no erasers. What does that say about me?
26.04.2025 15:56 β π 0 π 0 π¬ 0 π 0
Just dropped a tutorial on using the OpenAI Agents SDK + @arize-phoenix.bsky.social to go from building to evaluating agents.
βοΈ Trace agent decisions at every step
βοΈ Offline and Online Evals using LLM as a Judge
If you're building agents, measuring them is essential.
Full vid and cookbook below
18.04.2025 18:51 β π 4 π 3 π¬ 1 π 0
Text reads: Building AI? Demo your app. Arize:Observe community demos. Submit by 4.30.25. Apply.
Demo your app at this year's Observe! Fill out a short application by 4.30 to be considered for our Demo Den. Great opportunity to showcase your work to the AI community in SF.
Apply here: docs.google.com/forms/d/e/1F...
28.03.2025 21:11 β π 2 π 2 π¬ 0 π 0
"The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise." - Edsger W. Dijkstra. Just read this and I am going to be using it a LOT.
21.03.2025 18:11 β π 80 π 11 π¬ 3 π 1
In case you missed it, Arize AI Phoenix crossed the 5k GitHub star mark last week! βοΈ
Phoenix has changed a TON since its first iteration.
I'm constantly in awe of the execution speed and quality of this team. Here's to the next 5k and beyond!
20.03.2025 16:07 β π 4 π 2 π¬ 0 π 0
Love the community we're building!
19.03.2025 17:49 β π 0 π 0 π¬ 0 π 0
LLM Evals Office Hours with Arize Β· Luma
Join us for an open coworking session focused on LLM and Agent Evaluations! Whether you're actively working on evaluation strategies or just exploring theβ¦
For all my NYC friends! π½π
We're hosting an in-person office hours tomorrow all around LLM and Agent Evals.
Join for the free snacks/drinks, stay for the heated discussions about the validity of Pokemon-based model evaluations β‘οΈπ
18.03.2025 18:20 β π 3 π 3 π¬ 0 π 0
How much more data does an LLM app really need?
In my latest tutorial, I explore how few-shot prompting boosts accuracy without massive datasets or retrainingβusing @arize-phoenix.bsky.social prompts and experiments to break it down.
This kicks off my prompting series... more to come!
18.03.2025 23:50 β π 7 π 3 π¬ 1 π 0
π€ OpenAI 's agent framework openai-agents provides a rich set of composable primitives that enable you to build agents.
openinference-instrumentation-openai-agents, an OpenTelememetry instrumentor that is compatible with any OTel backend like @arize-phoenix.bsky.social. Fully OSS and free to use!
16.03.2025 14:54 β π 0 π 0 π¬ 0 π 0
How can you programmatically improve your prompts? π€ π€
Forget manual prompt engineering - there are better (read: "more automatic") ways to improve your prompts.
This video and notebook break down these techniques.
Featuring:
- DSPy
- @arize-phoenix.bsky.social
03.03.2025 17:01 β π 5 π 2 π¬ 1 π 0
Prompt Management from First Principles
In Phoenix 8.0, we built a prompt management system to ensure reproducibility and empower developers with better testing and control.
Learn how we built a holistic prompt management system that preserves developer freedom.
With Phoenix 8.0, we built a prompt management system that prioritizes: LLM reproducibility, prompt versioning & tracking, & developer flexibilityβno vendor lock-in
arize.com/blog/prompt-...
07.03.2025 17:45 β π 4 π 4 π¬ 0 π 0
AI is all about vibes lately
01.03.2025 16:37 β π 0 π 0 π¬ 0 π 0
YouTube video by Arize AI
Phoenix 8.0 - Prompts
With Phoenix 8.0 you can version & iterate on prompts seamlesslyβboth in the UI & in code! What's new:
π TypeScript Client: Sync prompts with your JavaScript runtime
π Python Client: Sync templates & apply them directly to AI SDKs
π Native prompt normalization & much more!
youtu.be/qbeohWaRlsM
28.02.2025 21:07 β π 2 π 1 π¬ 0 π 0
π€ Building agents, but not sure how to measure their performance?
Our newest blog post on @hf.co has you covered!
This post shows you how to use @arize-phoenix.bsky.social to trace and evaluate your smolagents.
Credit to @srichavali.bsky.social and @aymeric-roucher.bsky.social
28.02.2025 17:19 β π 6 π 4 π¬ 1 π 0
Your AI Agent isn't an Engineer
Table of Contents Why This Conversation Matters How AI Marketing Shaped This Perception The Problem...
AI agents are just tools. They're not engineers. They're not people. While I get the value of describing AI agents as human, I think it's a tad lazy and it creates a self-sabotaging narrative for our industry.
dev.to/blackgirlbyt...
17.02.2025 13:32 β π 30 π 6 π¬ 1 π 1
Learning Cursor and Vim at the same time was a bad idea. They philosophically clash. One is for vibe coding driven by AI instinct, the other is vibe coding driven by human instinct. Switching to Zed for a while to focus on building my own intuitions.
17.02.2025 01:07 β π 0 π 0 π¬ 0 π 0
VP Developer Relations at @llamaindex.bsky.social. Previously: Data at Netlify, co-founded npm, awe.sm, started lgbtq.technology. Married to @jovo.design. He/him. πΉπΉπ¬π§πΊπΈπ³οΈβπ
Writer & Software Engineering Consultant. UCRiverside - Palm Desert MFA candidate.
Reed BA β08 | OSU MA β12
Boulder County Historic Preservation Advisory Board
Reed Alumni Board
Boulder, CO mostly.
cemckenna.com
Thereβs no such thing as a bad pun.
Senior Software-er
Writing open source code
https://anthonypowell.me
CISO @ Arize AI | Advisor
Full-time TypeScript educator. Used to be a voice coach. He/him. Author of Total TypeScript π§ Hire me to teach your team TypeScript!
Software engineer @ Google. Code analysis + LLM
Field Engineering @ Anysphere
Open-Source AI Observability and Evaluation
app.phoenix.arize.com
Improving π with quality software Β· Husband, 5x Father, Latter-day Saint, Web Dev, Educator, Microsoft MVP
π https://EpicWeb.dev
π https://EpicReact.dev
π https://TestingJavaScript.com
evals evals evals. https://evals.info
Founder at hypergolic.ai
Blog laszlo.substack.com
Writes about Code Quality For Data Science cq4ds.com
βΌγ»α΄₯γ»βΌ working on React Compiler. formerly known as [@]potetotes on the cursed place
AI observability & evaluation platform - i.e. We make AI work.
i want everyone to aspire to integrity, intensity, and intentionality. see also @latent.space