Andrew Drozdov's Avatar

Andrew Drozdov

@mrdrozdov.com.bsky.social

Research Scientist @ Mosaic x Databricks. Adaptive Methods for Retrieval, Generation, NLP, AI, LLMs https://mrdrozdov.github.io/

5,178 Followers  |  607 Following  |  179 Posts  |  Joined: 23.04.2023  |  1.9878

Latest posts by mrdrozdov.com on Bluesky

Post image

The transformer was invented in Google. RLHF was not invented in industry labs, but came to prominence in OpenAI and DeepMind. I took 5 of the most influential papers (black dots) and visualized their references. Blue dots are papers that acknowledge federal funding (DARPA, NSF).

12.04.2025 02:35 β€” πŸ‘ 107    πŸ” 23    πŸ’¬ 2    πŸ“Œ 0
LongEval 2025 Conference Template

LongEval is turning three this year!

This is a Call for Participation to our CLEF 2025 Lab - try out how your IR system does in the long term.

Check the details on our page:
clef-longeval.github.io

11.04.2025 11:24 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

The PhD is pretraining. Interview prep is alignment. Take this to heart. :)

13.04.2025 08:16 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Leaderboard showing performance of language models on claim verification task over book-length input. o1-preview is the best model with 67.36% accuracy followed by Gemini 2.5 Pro with 64.17% accuracy.

Leaderboard showing performance of language models on claim verification task over book-length input. o1-preview is the best model with 67.36% accuracy followed by Gemini 2.5 Pro with 64.17% accuracy.

We have updated #nocha, a leaderboard for reasoning over long-context narratives πŸ“–, with some new models including #Gemini 2.5 Pro which shows massive improvements over the previous version! Congrats to #Gemini team πŸͺ„ πŸ§™ Check πŸ”— novelchallenge.github.io for details :)

02.04.2025 04:30 β€” πŸ‘ 11    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
ARR Dashboard

I think ARR used to do this? Seems like it’s missing in the recent cycle(s).

stats.aclrollingreview.org/iterations/2...

28.03.2025 20:26 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A corollary here is that a relevant context might not improve the probability of the right answer.

22.03.2025 11:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Perhaps the most misunderstood aspect of retrieval: For a context to be relevant, it is not enough for it to improve the probability of the right answer.

22.03.2025 11:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

MLflow is on BlueSky! Follow @mlflow.org to keep up to date on new releases, blogs and tutorials, events, and more.

14.03.2025 23:02 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

ris.utwente.nl/ws/portalfil...

12.03.2025 03:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

---Born To Add, Sesame Street
---(sung to the tune of Bruce Springsteen’s Born to Run)

12.03.2025 03:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

One, and two, and three police persons spring out of the shadows
Down the corner comes one more
And we scream into that city night: β€œthree plus one makes four!”
Well, they seem to think we’re disturbing the peace
But we won’t let them make us sad
’Cause kids like you and me baby, we were born to add

12.03.2025 03:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

"How Claude Code is using a 50-Year-Old trick to revolutionize programming"

11.03.2025 03:21 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Somehow my most controversial take of 2025 is that agents relying on grep are a form of RAG.

11.03.2025 03:20 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1
Preview
Promptagator: Few-shot Dense Retrieval From 8 Examples Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the impli...

Embedding finetuning is not a new idea, but it's still overlooked IMO.

The promptagator work is one of the more impactful papers that show finetuning with synthetic data is effective.

arxiv.org/abs/2209.11755

26.02.2025 00:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Data Brew by Databricks on LinkedIn: Join us on the latest Data Brew episode for a deep dive on Retrieval… Join us on the latest Data Brew episode for a deep dive on Retrieval, rerankers, and RAG tips and tricks with our very own Andrew Drozdov, Research Scientist…

Search is the key to building trustworthy AI and will only be more important as we build more ambitious applications. With that in mind, there's not nearly enough energy spent improving the quality of search systems.

Follow the link for the full episode:
www.linkedin.com/posts/data-b...

26.02.2025 00:53 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

It was a real pleasure talking about effective IR approaches with Brooke and Denny on the Data Brew podcast.

Among other things, I'm excited about embedding finetuning and reranking as modular ways to improve RAG pipelines. Everyone should use these more!

26.02.2025 00:53 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Improving Retrieval and RAG with Embedding Model Finetuning Fine-tune embedding models on Databricks to enhance retrieval and RAG accuracy with synthetic dataβ€”no manual labeling required.

We're probably a little too obsessed with zero-shot retrieval. If you have documents (you do), then you can generate synthetic data, and finetune your embedding. Blog post lead by @jacobianneuro.bsky.social shows how well this works in practice.

www.databricks.com/blog/improvi...

26.02.2025 00:48 β€” πŸ‘ 8    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0

I do want to see aggregate stats about the model’s generation and total reasoning tokens is perhaps the least informative one.

01.02.2025 14:52 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

"All you need to build a strong reasoning model is the right data mix."

The pipeline that creates the data mix:

26.01.2025 23:30 β€” πŸ‘ 13    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

After frequent road runs during a Finland visit I tend to feel the same

23.01.2025 04:07 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Using 100+ tokens to answer 2 + 3 =

22.01.2025 17:42 β€” πŸ‘ 19    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

It’s pretty obvious we’re in a local minima for pretraining. Would expect more breakthroughs in the 5-10 year range. Granted, it’s still incredibly hard and expensive to do good research in this space, despite the number of labs working on it.

27.12.2024 03:50 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Word of the day (of course) is β€˜scurryfunging’, from US dialect: the frantic attempt to tidy the house just before guests arrive.

23.12.2024 12:54 β€” πŸ‘ 3298    πŸ” 576    πŸ’¬ 107    πŸ“Œ 75

... didn't know this would be one of the hottest takes i've had ...

for more on my thoughts, see drive.google.com/file/d/1sk_t...

21.12.2024 19:17 β€” πŸ‘ 51    πŸ” 7    πŸ’¬ 3    πŸ“Œ 0
i sensed anxiety and frustration at NeurIPS’24 – Kyunghyun Cho

feeling a but under the weather this week … thus an increased level of activity on social media and blog: kyunghyuncho.me/i-sensed-anx...

21.12.2024 19:47 β€” πŸ‘ 181    πŸ” 36    πŸ’¬ 19    πŸ“Œ 13
Preview
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Encoder-only transformer models such as BERT offer a great performance-size tradeoff for retrieval and classification tasks with respect to larger decoder-only models. Despite being the workhorse of n...

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Introduces ModernBERT, a bidirectional encoder advancing BERT-like models with 8K context length.

πŸ“ arxiv.org/abs/2412.13663
πŸ‘¨πŸ½β€πŸ’» github.com/AnswerDotAI/...

20.12.2024 05:04 β€” πŸ‘ 18    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Preview
State Space Models are Strong Text Rerankers Transformers dominate NLP and IR; but their inference inefficiencies and challenges in extrapolating to longer contexts have sparked interest in alternative model architectures. Among these, state spa...

State Space Models are Strong Text Rerankers

Shows Mamba-based models achieve comparable reranking performance to transformers while being more memory efficient, with Mamba-2 outperforming Mamba-1.

πŸ“ arxiv.org/abs/2412.14354

20.12.2024 05:13 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

I’m being facetious, but the truth behind the joke is that OCR correction opens up the possibility (and futility) of language much like drafting poetry. For every interpreted pattern for optimizing OCR correction, exceptions arise. So, too, with patterns in poetry.

19.12.2024 02:24 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Wait can you say more

19.12.2024 01:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Reasoning is fascinating but confusing.

Is reasoning a task? Or is reasoning a method for generating answers, for any task?

17.12.2024 02:00 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 4    πŸ“Œ 0

@mrdrozdov.com is following 20 prominent accounts