Leo Boytsov's Avatar

Leo Boytsov

@srchvrs.bsky.social

Machine learning scientist and engineer speaking πtorch & C++ (ph-D CMU) working on (un)natural language processing, speaking πtorch & C++. Opinions sampled from MY OWN 100T param LM.

703 Followers  |  168 Following  |  103 Posts  |  Joined: 18.11.2024  |  1.9441

Latest posts by srchvrs.bsky.social on Bluesky

Preview
A Large-Scale Study of Reranker Relevance Feedback at Inference | Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

Results? If read tables correctly, there's only very modest boost in both recall & NDCG, which is within 2%. Given that the procedure requires a second retrieval, it does not seem to worth an effort.
🟦
dl.acm.org/doi/abs/10.1...

18.07.2025 18:01 — 👍 0    🔁 0    💬 0    📌 0

PRF was not forgotten in the neural IR times, but how does it perform really? Revanth Gangi Reddy & colleagues ran a rather thorough experiment and published it SIGIR.
↩️

18.07.2025 18:01 — 👍 0    🔁 0    💬 1    📌 0
Statistical source expansion for question answering for CIKM 2011 Statistical source expansion for question answering for CIKM 2011 by Nico Schlaefer et al.

It was doc2query before doc2query and, in fact, it improved performance (by a few%) of the IBM Watson QA system that beat human champions in Jeopardy!
↩️
research.ibm.com/publications...

18.07.2025 18:01 — 👍 0    🔁 0    💬 1    📌 0

I think this is a problem of completely unsupervised and blind approach of adding terms to the query. If we had some supervision signal to filter out potentially bad terms, this would work out better. In fact, a supervised approach was previously used to add terms to documents!
↩️

18.07.2025 18:01 — 👍 0    🔁 0    💬 1    📌 0

Fixing this issue produced a sub-topic in the IR community devoted to fixing this issue and identifying cases where performance degrades substantially in advance. Dozens of approaches were proposed, but I do not think it was successful. Why⁉️
↩️

18.07.2025 18:01 — 👍 0    🔁 0    💬 1    📌 0

PRF tends to improve things on average, but has a rather nasty property of tanking outcomes for some queries rather dramatically: When things go wrong (i.e., unlucky unrelated terms are added to the query), they can go very wrong. ↩️

18.07.2025 18:01 — 👍 0    🔁 0    💬 1    📌 0
Preview
Leo Boytsov on X: "🧵40 years ago the SMART IR system was released. It introduced a few key concepts including vector space interpretation of the retrieval process and the relevance feedback algorithm. I also think it was probably the first open source search engine. ↩️" / X 🧵40 years ago the SMART IR system was released. It introduced a few key concepts including vector space interpretation of the retrieval process and the relevance feedback algorithm. I also think it was probably the first open source search engine. ↩️

PRF is an old technique introduced 40 years ago in the SMART system (arguably the first open-source IR system). ↩️
x.com/srchvrs/stat...

18.07.2025 18:01 — 👍 0    🔁 0    💬 1    📌 0

🧵Pseudo-relevance feedback (PRF) (also known as blind feedback) is a technique of first retrieving/re-ranking top-k documents and adding some of their words to the initial query. Then, a second retrieval/ranking stage uses an updated query. ↩️

18.07.2025 18:01 — 👍 1    🔁 1    💬 1    📌 0

If you submitted a messy paper, it's pointless to address every little comment and promise fixing it in the final version. 🟦

02.07.2025 03:04 — 👍 0    🔁 0    💬 0    📌 0

Instead, think hard about questions you can ask. What is the main misunderstanding? What will you have to do so that a reviewer will accept your work next time. Which concise questions can you ask to avoid misunderstanding in the future? ↩️

02.07.2025 03:04 — 👍 0    🔁 0    💬 1    📌 0

🧵 Dear (scientific) authors: I am being in the same boat too. However, if you receive a ton of detailed complaints regarding paper quality, do NOT try to address them during the rebuttal phase. It's just a waste of everybody's time. ↩️

02.07.2025 03:04 — 👍 0    🔁 0    💬 1    📌 0
Preview
Microsoft Is Having an Incredibly Embarrassing Problem With Its AI Despite investing tens of billions of dollars into OpenAI, Microsoft is still, apparently, competing with its business partner.

@microsoft.com faces an interesting issue that might affect others selling wrappers around ChatGPT and Claude models: users prefer to use ChatGPT directly rather than engage with Microsoft's Copilot.
futurism.com/microsoft-co...

27.06.2025 14:34 — 👍 0    🔁 0    💬 0    📌 0
Preview
I have bittersweet news to share. | Lysandre Debut I have bittersweet news to share. Yesterday we merged a PR deprecating TensorFlow and Flax support in transformers. Going forward, we're focusing all our efforts on PyTorch to remove a lot of th...

This is a rather blockbuster piece of news: the @hf.co library is dropping support for both Jax and Tensorflow.
www.linkedin.com/posts/lysand...

12.06.2025 19:04 — 👍 1    🔁 0    💬 0    📌 0

Humans are creating AGI and you claim that their intelligence is overrated?

22.05.2025 04:26 — 👍 2    🔁 0    💬 0    📌 0

Laptop keyboards are close to being unusable. Tremendous productivity hit.

27.04.2025 03:16 — 👍 1    🔁 0    💬 0    📌 0

Found a hidden gem on IR evaluation methodology from Microsoft "What Matters in a Measure? A Perspective from Large-Scale Search Evaluation."
dl.acm.org/doi/pdf/10.1...

27.04.2025 03:15 — 👍 4    🔁 0    💬 0    📌 0

Parental advice: if you master algebra you will know how to deal with your x-es.
@ccanonne.bsky.social feel free to borrow!

21.04.2025 23:41 — 👍 0    🔁 0    💬 0    📌 0

Some people say: A prompt is worth a thousand words! Excuse, but have you seen these ones? They are way longer!

16.04.2025 01:42 — 👍 1    🔁 0    💬 0    📌 0

However, unlike many others who see threat in the form of a "terminator-like" super-intelligence, @lawrennd.bsky.social worries about unpredictability of automated decision making by entity that's superior in some ways, inferior in others, but importantly is disconnected from the needs of humans. ⏹️

07.04.2025 02:24 — 👍 1    🔁 0    💬 0    📌 0

🧵A fascinating perspective on the nature of intelligence and the history of automation/ (and ahem development of AI). It is also a cautionary story of how to not trust AI too much. ↩️

07.04.2025 02:24 — 👍 4    🔁 1    💬 1    📌 0

Thus, it was quite insightful to read a recent blog post by
@netflix
detailing their experience in training foundation RecSys LLMs. It’s an informative read, packed with detailed, behind-the-scenes information.
🟦

23.03.2025 20:59 — 👍 0    🔁 0    💬 0    📌 0

Pre-training can be non-trivial. If you represent a set of users or items using fixed IDs, your model will not generalize well to a domain with different set of users or items (although there are some workarounds arxiv.org/abs/2405.03562).
↩️

23.03.2025 20:59 — 👍 0    🔁 0    💬 1    📌 0

🧵Although pre-trained Transformer models took NLP by storm, they were less successful for recommender systems (arxiv.org/abs/2306.11114). RecSys is hard:
1. The number of users is high.
2. The number of items is high.
3. A cold-start problem is a hard one.
↩️

23.03.2025 20:59 — 👍 2    🔁 0    💬 1    📌 0
Preview
Preference Leakage: A Contamination Problem in LLM-as-a-judge Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods in model development. While their combination significantly enhan...

" ... that preference leakage is a pervasive issue that is harder to detect compared to previously identified biases in LLM-as-a-judge scenarios. All of these findings imply that preference leakage is a widespread and challenging problem in the area of LLM-as-a-judge. "
arxiv.org/abs/2502.01534

04.02.2025 14:52 — 👍 2    🔁 0    💬 0    📌 0

🧵"Through extensive experiments, we empirically confirm the bias of judges towards their related student models caused by preference leakage across multiple LLM baselines and benchmarks. Further analysis suggests ..." ↩️

04.02.2025 14:52 — 👍 2    🔁 0    💬 1    📌 0

So you mean specifically GRPO or more like a combination of GRPO and binary rewards?

31.01.2025 02:56 — 👍 2    🔁 0    💬 0    📌 0
Debunking myths of vector search and LLMs with Leo Boytsov, Senior Research Scientist, AWS
YouTube video by Vector Podcast Debunking myths of vector search and LLMs with Leo Boytsov, Senior Research Scientist, AWS

We also talked about k-NN search, approaches to learning sparse & dense representations for text retrieval, as well as about integration of vector search into traditional DB systems. 🟦
www.youtube.com/watch?v=gzWE...

17.01.2025 16:48 — 👍 7    🔁 0    💬 0    📌 0

🧵It was my great pleasure to appear on the vector podcast with @dmitrykan.bsky.social . We covered the history of our NMSLIB library and how it helped shape the vector search industry! ↩️

17.01.2025 16:48 — 👍 7    🔁 1    💬 1    📌 0

They gathered some of the best people. They surely have a fighting chance to reduce the cost of serving and reach profitability within a couple of years.

06.01.2025 08:51 — 👍 5    🔁 0    💬 0    📌 0

Ok, please, show us a minimal-viable example of doing so. Do not forget that no everyone knows jax and not everyone is willing to learn it. It is not as widely used as numpy, pytorch, or even tensorflow.

06.01.2025 06:36 — 👍 2    🔁 0    💬 0    📌 0

@srchvrs is following 19 prominent accounts