@syeddula.bsky.social
Arize AI!
Missed the news from Arize Observe 2025? Phoenix Cloud just got Spaces & Access Management!
β¨ Create tailored Spaces
π Manage user permissions
π₯ Easy team collaboration
More than a feature, itβs Phoenix adapting to you.
Spin up a new Phoenix project & test it out!
@arize-phoenix.bsky.social
Docs: docs.arize.com/phoenix/trac...
Notebook: colab.research.google.com/github/Arize...
π New in OpenInference: Python auto-instrumentation for the Google GenAI SDK!
Add GenAI tracing to your @arize-phoenix.bsky.social applications in just a few lines. Works great with Span Replay so you can debug, tweak, and explore agent behavior in prompt playground.
Check Notebook + docs below!π
Learn to prompt better
07.05.2025 19:26 β π 6 π 5 π¬ 0 π 0Check out the full video: youtu.be/iOGu7-HYm6s?...
18.04.2025 18:51 β π 0 π 1 π¬ 1 π 0Just dropped a tutorial on using the OpenAI Agents SDK + @arize-phoenix.bsky.social to go from building to evaluating agents.
βοΈ Trace agent decisions at every step
βοΈ Offline and Online Evals using LLM as a Judge
If you're building agents, measuring them is essential.
Full vid and cookbook below
We've added GPT-4.1 models to the @arize-phoenix.bsky.social Prompt Playground.
My go-to way to test out these new models: grab a failed trace from a previous run, pull it into playground, switch the model and see if 4.1 can succeed where 4o failed.
Early signs are promising!
good point - the focus of the tutorial was on general prompt optimization techniques. textgrad is awesome for gradient-based optimization, but this approach aimed to keep things more widely applicable. definitely worth exploring in a future video for more fine tuning
08.04.2025 07:50 β π 0 π 0 π¬ 1 π 0Notebook: github.com/Arize-ai/pho...
07.04.2025 17:15 β π 0 π 0 π¬ 0 π 0Full video: youtu.be/pvef59pEmvo
07.04.2025 17:15 β π 0 π 1 π¬ 1 π 0LLM as a Judge allows models to evaluate outputs in a single promptβbut a good judging needs a good prompt
In my new tutorial, learn techniques on how to optimize your prompt so your judge can improve accuracy, cost, fairness, and robustness
better prompts β‘οΈ better evals
Notebook: github.com/Arize-ai/pho...
24.03.2025 23:27 β π 3 π 1 π¬ 0 π 0Notebook: github.com/Arize-ai/pho...
24.03.2025 23:25 β π 0 π 0 π¬ 0 π 0Think + Act β all within your prompt
In this tutorial, I apply ReAct principles to prompt LLMs to Reason + Act like humans. By specifying these steps, the LLM generates reasoning and interacts with tools for greater accuracy.
Full Video Tutorial: youtu.be/PB7hrp0mz54?...
Hey! No particular reason for sticking with 3.5 in these demos, just what I've been rolling with. phoenix prompts and these notebooks let you swap out models easily if you are interested in testing that out
I'll switch it up in some upcoming notebooks. thanks for the feedback!
Video: www.youtube.com/watch?si=yHW...
Notebook: github.com/Arize-ai/pho...
#LLM #prompts #observability
How much LLM reasoning can you drive through your prompt itself?
Iβve been using Chain of Thought (CoT) prompting to help LLMs replicate logical step-by-step thinking.
For the next segment in my prompting series, I use @arize-phoenix.bsky.social to test the performance of various CoT methods
π 5000 Stars and Counting... π
We're celebrating Phoenix reaching 5000 stars on GitHub! This milestone underscores the growing demand for robust, open-source tools that tackle the complexities of AI and LLM development
Check it out: github.com/Arize-ai/pho...
www.youtube.com/watch?v=bW5Z...
Notebook: github.com/Arize-ai/pho...
Video: www.youtube.com/watch?v=ggXc...
How much more data does an LLM app really need?
In my latest tutorial, I explore how few-shot prompting boosts accuracy without massive datasets or retrainingβusing @arize-phoenix.bsky.social prompts and experiments to break it down.
This kicks off my prompting series... more to come!
π§ Phoenix now supports Anthropic Sonnet 3.7 & Thinking Budgets!
This makes Prompt Playground ideal for side-by-side reasoning tests: o3 vs. Anthropic vs. R1.
Plus, GPT-4.5 support keeps it up to date with the latest from OpenAI & Anthropic - test them all out in the playground! β‘οΈ
Some updates for Projects! Gain more flexibility and control with:
π Persistent column selection for consistent views
π Filter data directly from tables with metadata and quick metadata filters
β³ Set custom time ranges for traces & spans
π³ Option to filter spans by root spans
Check out the demoπ
Prompt optimization is essential, and automating it with frameworks like DSPy gives you scalable and data-driven improvements.
There's also a tutorial linked in here where you can use Phoenix to compare the performance of different techniques. π
arize.com/blog/prompt-...