Omar Khattab's Avatar

Omar Khattab

@lateinteraction.bsky.social

Incoming asst professor at MIT EECS, Fall 2025. Research scientist at Databricks. CS PhD @StanfordNLP.bsky.social. Author of ColBERT.ai & DSPy.ai.

1,230 Followers  |  239 Following  |  53 Posts  |  Joined: 20.11.2024  |  1.982

Latest posts by lateinteraction.bsky.social on Bluesky

Drew’s post is well worth reading as DSPy seems to be a missing link in thinking about LLM usage. Very readable and interesting. www.dbreunig.com/2025/06/10/l...

Thank you @simonwillison.net

06.10.2025 11:42 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Let the LLM Write the Prompts: An Intro to DSPy in Compound AI Pipelines
YouTube video by Databricks Let the LLM Write the Prompts: An Intro to DSPy in Compound AI Pipelines

If you've been trying to figure out DSPy - the automatic prompt optimization system - this talk by @dbreunig.bsky.social is the clearest explanation I've seen yet, with a very useful real-world case study www.youtube.com/watch?v=I9Zt...

My notes here: simonwillison.net/2025/Oct/4/d...

04.10.2025 23:05 β€” πŸ‘ 100    πŸ” 13    πŸ’¬ 7    πŸ“Œ 3
Post image

#pydatabos interesting! How the Arbor library works under the hood hand in hand with DSPy

15.10.2025 23:42 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

premature optimization is the sqrt of all evil

29.10.2025 16:26 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

#pydatabos one line motivation for using DSPy!

15.10.2025 23:56 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1

Stop what you are doing and try out GEPA now!

"GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning" presents such elegant ideas by a collection of amazing researchers!

Here is a tldr of how it works:

21.10.2025 15:03 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Btw there’s no trouble in storage at all either.

ColBERT vectors are often 10 bytes each. Ten bytes. That’s like 4 numbers.

It’s not β€œmany vectors work better than one vector”. It’s β€œset similarity works better than dot product”.

Even with the same storage cost.

28.09.2025 02:20 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
A diagram illustrating a dual-encoder retrieval model using MaxSim scoring.
	β€’	On the left (green box): labeled β€œQuery Encoder, f_Q”. It takes a Query as input and produces multiple vector embeddings (rectangles).
	β€’	On the right (blue box): labeled β€œDocument Encoder, f_D”. It takes a Document as input and produces multiple vector embeddings (rectangles). This block is marked with β€œOffline Indexing” along the side, showing that documents are pre-encoded.
	β€’	Between the two encoders: dotted and solid arrows connect query embeddings to document embeddings, representing similarity comparisons.
	β€’	Each comparison goes through a β€œMaxSim” operation (highlighted boxes), which selects the maximum similarity for each query token across document tokens.
	β€’	At the top: outputs of MaxSim flow into a summation node (Ξ£) to produce a single score for ranking.

This shows the ColBERT (Contextualized Late Interaction) retrieval framework: query and document are encoded separately, interactions are computed via maximum similarity per query token, and results are aggregated into a score.

A diagram illustrating a dual-encoder retrieval model using MaxSim scoring. β€’ On the left (green box): labeled β€œQuery Encoder, f_Q”. It takes a Query as input and produces multiple vector embeddings (rectangles). β€’ On the right (blue box): labeled β€œDocument Encoder, f_D”. It takes a Document as input and produces multiple vector embeddings (rectangles). This block is marked with β€œOffline Indexing” along the side, showing that documents are pre-encoded. β€’ Between the two encoders: dotted and solid arrows connect query embeddings to document embeddings, representing similarity comparisons. β€’ Each comparison goes through a β€œMaxSim” operation (highlighted boxes), which selects the maximum similarity for each query token across document tokens. β€’ At the top: outputs of MaxSim flow into a summation node (Ξ£) to produce a single score for ranking. This shows the ColBERT (Contextualized Late Interaction) retrieval framework: query and document are encoded separately, interactions are computed via maximum similarity per query token, and results are aggregated into a score.

colbert-muvera-micro a 4M(!!) late interaction model

late interaction models do embedding vector index queries and reranking at the same time leading to far higher accuracy

huggingface.co/NeuML/colber...

19.09.2025 11:15 β€” πŸ‘ 14    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

Let the Model Write the Prompt | Drew Breunig #dspy #promptengineering #llms #generativeai

19.06.2025 15:54 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Let the Model Write the Prompt Notes from a talk I delivered at the 2025 Data + AI Summit, detailing the problem with prompts in your code and how DSPy can make everything better.

Here's the write up of my Data+AI Summit talk on the perils of prompts in code and how to mitigate them with DSPy. www.dbreunig.com/2025/06/10/l...

15.06.2025 16:57 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Have you heard the news? #MLflow now supports tracking for DSPy optimization workflowsβ€”just like it does for #PyTorch training!

Keep reading to see what this means for your #LLM projects… πŸ‘‡

#opensource #dspy #oss

30.05.2025 15:08 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
MLflow Community Meetup | April 23 Β· Luma Join us for the next MLflow Community Meetup β€” Wednesday, April 23 at 4PM PT! We’re bringing two exciting presentations to the community: πŸ”Ή MLflow + DSPy…

πŸ“£ TODAY at 4PM PT - MLflow Community Meetup!

πŸ”— Register today πŸ‘‰ lu.ma/mlflow423

Join the global MLflow community for two exciting tech deep dives:
πŸ”Ή MLflow + #DSPy Integration
πŸ”Ή Cleanlab + #MLflow

πŸŽ₯ Streaming live on YouTube, LinkedIn, and X
πŸ’¬ Live Q&A with the presenters

#opensource #oss

23.04.2025 19:21 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

MLflow now supports tracking for #DSPy (Community) optimization β€” just like it does for @pytorch.org training! πŸ™Œ

#MLflow is the first to bring full visibility into DSPy’s prompt optimization process. More observability, less guesswork.

Get started today! ➑️ medium.com/@AI-on-Datab...

#opensource

21.04.2025 19:20 β€” πŸ‘ 5    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
MLflow Monthly Meetup Β· Luma Join us for the next MLflow Community Meetup β€” Wednesday, April 23 at 4PM PT! We’re bringing two exciting presentations to the community: πŸ”Ή MLflow + DSPy…

Join us for the next MLflow Community Meetup β€” Wednesday, April 23 at 4PM PT! πŸ—“οΈ

πŸ”Ή Explore the new MLflow + #DSPy integration
πŸ”Ή Learn how Cleanlab adds trust to AI workflows with MLflow

πŸ’¬ Live Q&A + demos
πŸ“Ί Streamed on YouTube, LinkedIn, and X
πŸ‘‰ RSVP: lu.ma/mlflow423

#opensource #mlflow #oss

15.04.2025 19:51 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
History - DSPy The framework for programmingβ€”rather than promptingβ€”language models.

Nice work! For history:

dspy.ai/api/primitiv...

06.04.2025 21:54 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This was built by a long-time DSPy community member!

04.03.2025 00:34 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Yes there's an evals crisis, but evaluating *models* is not even the right question most of the time

LangProBe from Shangyin Tan, @lakshyaaagrawal.bsky.social, Arnav Singhvi, Liheng Lai, @michaelryan207.bsky.social et al begins to ask what complete *AI systems* we should build & under what settings

03.03.2025 19:42 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

🧡Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs!

We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.

03.03.2025 18:58 β€” πŸ‘ 6    πŸ” 3    πŸ’¬ 1    πŸ“Œ 2

It doesn't help that the we in ML often only design abstractions leak all kinds of implementation details. Folks often define ML itself in terms of techniques, not problems!

But it's prematurely abstracting that leads to the bitterness of wasted effort, and not "modularity doesn't work for AI". 2/2

26.02.2025 21:59 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Composition & abstraction are the foundations of CS, but are clearly absent in modern ML.

It's not that they're not crucial for intelligent software. But it takes building many half-working systems to abstract successfully, and it takes good abstractions to have primitives worth composing.

🧡1/2

26.02.2025 21:59 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

4) By default, IR methods that use "multiple vectors" (e.g., cross-encoders) are unscalable. It seems like a necessary tradeoff, but the fascinating thing in late interaction is that it's easy to implement in asymptotically sub-linear ways, thanks to pruning.

Hope this was useful!

26.02.2025 19:14 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

3) "Multi-vector" makes it sound like these approaches win because they store "more stuff".

But that's not true: if you look at how aggressive ColBERTv2 representations are compressed, it's often ~20 bytes per vector (like 5 floats), which can be smaller than popular uncompressed single vectors!

26.02.2025 19:14 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For dot products, every time you "fix" one query--document pair, you likely break so many other pairs by moving the query and/or document representations.

For ColBERT, you typically *fix* more than you break because you're moving *tokens* in a much smaller (and far more composable!) space.

26.02.2025 19:14 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The problem isn't the vector representation, it's the **learnability of the scoring function**.

A dot product is just very hard to learn. An intuition I learned from Menon et al (2021) is that:

26.02.2025 19:14 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

2) More importantly, there's nothing to say you can't store a TON of information in a single vector. And it's easy to use multiple vectors and gain *zero* improvement over a single-vector, e.g. if you replace MaxSim with AvgSim in ColBERT, without any other changes, it doesn't work!

26.02.2025 19:14 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

1) If you take ColBERT and force it to use only a constant number of vectors (e.g., 16), it'll barely outperform one vector in the general case.

It's not that you need token-level alignment per se (you don't either!) but you want fine-grained representations, not just *multiple* representations.

26.02.2025 19:14 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Some quick thoughts: On why we gave the ColBERT paradigm the name "late interaction" instead of "multi-vector", a term that emerged later and that has proven to be more intuitive.

**The mechanism is actually not about having multiple vectors at all.** You can see this in four different ways.

🧡1/7

26.02.2025 19:14 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Btw the full general form to export all message templates is:

```
{name: my_adapter.format(p.signature, demos=p.demos, inputs={k: f'{{{k}}}' for k in p.signature.input_fields}) for name, p in your_program.named_predictors()}
```

20.02.2025 03:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The default Adapter is dspy.ChatAdapter().

But you can do all customization you mentioned with a custom Adapter:

class MyAdapter(dspy.Adapter):
def format(self, signature, demos, inputs):
return {"role": "user", "content": ...}

def parse(self, signature, completion):
return {....}

20.02.2025 03:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@lateinteraction is following 20 prominent accounts