๐ PyTerrier Advent 25/25: To wrap up the our advent series, we'd like thank the contributors shown below, and to the many others who support the PyTerrier ecosystem! #WorldChangersTogether
25.12.2025 07:56 โ ๐ 1 ๐ 1 ๐ฌ 0 ๐ 0@irglasgow.bsky.social
Glasgow Information Retrieval Group at the University of Glasgow
๐ PyTerrier Advent 25/25: To wrap up the our advent series, we'd like thank the contributors shown below, and to the many others who support the PyTerrier ecosystem! #WorldChangersTogether
25.12.2025 07:56 โ ๐ 1 ๐ 1 ๐ฌ 0 ๐ 0๐ PyTerrier Advent 25/25: To wrap up the our advent series, we'd like thank the contributors shown below, and to the many others who support the PyTerrier ecosystem! #WorldChangersTogether
25.12.2025 07:56 โ ๐ 1 ๐ 1 ๐ฌ 0 ๐ 0๐ PyTerrier Advent 24/25: Removing low-quality docs can boost search quality and cut indexing costs. Our SIGIRโ24 paper QT5 trains a T5 model to filter passages at indexing timeโeasy to integrate, and works with dense, PISA, or SPLADE indexes too.
24.12.2025 09:23 โ ๐ 4 ๐ 3 ๐ฌ 1 ๐ 0๐ PyTerrier Advent 24/25: Removing low-quality docs can boost search quality and cut indexing costs. Our SIGIRโ24 paper QT5 trains a T5 model to filter passages at indexing timeโeasy to integrate, and works with dense, PISA, or SPLADE indexes too.
24.12.2025 09:23 โ ๐ 4 ๐ 3 ๐ฌ 1 ๐ 0๐PyTerrier Advent 23/25: Youโve done retrieval, but the results seem too homogeneous. Use a diversification reranker. Shown below is the implicit MMR diversification approach, instantiated on a BM25 or dense retrieval, but even an explicit approach like xQuAD (c.f. Rodrygo Santos) is easy to write.
23.12.2025 11:54 โ ๐ 3 ๐ 3 ๐ฌ 1 ๐ 0๐PyTerrier Advent 23/25: Youโve done retrieval, but the results seem too homogeneous. Use a diversification reranker. Shown below is the implicit MMR diversification approach, instantiated on a BM25 or dense retrieval, but even an explicit approach like xQuAD (c.f. Rodrygo Santos) is easy to write.
23.12.2025 11:54 โ ๐ 3 ๐ 3 ๐ฌ 1 ๐ 0๐ PyTerrier Advent 22/25: A more complex pipelineโknowledge-graphโenhanced RAG from our EMNLP 2024 paper TRACE. We build a KG over retrieved docs, then use a transformer to reason over triples for better QA. This pipeline instantiation uses a cache (see 20th advent) on LLM-based KG extraction.
22.12.2025 12:25 โ ๐ 4 ๐ 4 ๐ฌ 1 ๐ 0๐ PyTerrier Advent 22/25: A more complex pipelineโknowledge-graphโenhanced RAG from our EMNLP 2024 paper TRACE. We build a KG over retrieved docs, then use a transformer to reason over triples for better QA. This pipeline instantiation uses a cache (see 20th advent) on LLM-based KG extraction.
22.12.2025 12:25 โ ๐ 4 ๐ 4 ๐ฌ 1 ๐ 0๐PyTerrier Advent 21/25: Bounded recall blues got you down? You can use Adaptive Retrieval techniques, like GAR, LADR, and LAFF, to efficiently surface missing relevant documents.
21.12.2025 11:25 โ ๐ 2 ๐ 2 ๐ฌ 1 ๐ 0๐PyTerrier Advent 21/25: Bounded recall blues got you down? You can use Adaptive Retrieval techniques, like GAR, LADR, and LAFF, to efficiently surface missing relevant documents.
21.12.2025 11:25 โ ๐ 2 ๐ 2 ๐ฌ 1 ๐ 0๐PyTerrier Advent 20/25: You can think of every PyTerrier transformer as a function - mapping from one dataframe type to another. This makes them easily cachable, courtesy of pyterrier_caching. We have cache object for retrievers, rerankers, or even indexing-time transformations (e.g. Doc2Query)
20.12.2025 10:54 โ ๐ 2 ๐ 2 ๐ฌ 1 ๐ 0๐PyTerrier Advent 20/25: You can think of every PyTerrier transformer as a function - mapping from one dataframe type to another. This makes them easily cachable, courtesy of pyterrier_caching. We have cache object for retrievers, rerankers, or even indexing-time transformations (e.g. Doc2Query)
20.12.2025 10:54 โ ๐ 2 ๐ 2 ๐ฌ 1 ๐ 0๐ PyTerrier Advent 19/25: PyTerrier-RAG brings agentic RAG to your workflows with support for SOTA methods like Search-R1 and R1-Searcher, to combine retrievers and reasoning. You could even swap BM25 out for dense or LSR retriever.
19.12.2025 10:24 โ ๐ 0 ๐ 1 ๐ฌ 1 ๐ 0๐ PyTerrier Advent 19/25: PyTerrier-RAG brings agentic RAG to your workflows with support for SOTA methods like Search-R1 and R1-Searcher, to combine retrievers and reasoning. You could even swap BM25 out for dense or LSR retriever.
19.12.2025 10:24 โ ๐ 0 ๐ 1 ๐ฌ 1 ๐ 0๐ PyTerrier Advent 18/25: In RAG, the reader runs the LLMโbut your pipeline shouldnโt depend on the LLM stack.
PyTerrier-RAG separates Reader from Backend, letting you swap vLLM โ HF with one line while keeping the same pipeline (and even share a Backend with other stages).
๐ PyTerrier Advent 18/25: In RAG, the reader runs the LLMโbut your pipeline shouldnโt depend on the LLM stack.
PyTerrier-RAG separates Reader from Backend, letting you swap vLLM โ HF with one line while keeping the same pipeline (and even share a Backend with other stages).
๐ PyTerrier Advent 17/25: FlexIndex simplifies dense retrieval by supporting FAISS, Voyager, FlatNav & more. It auto-builds data structures, reuses vector stores to cut storage costs, and offers one familiar API for many retrievers.
17.12.2025 14:30 โ ๐ 0 ๐ 1 ๐ฌ 1 ๐ 0๐ PyTerrier Advent 17/25: FlexIndex simplifies dense retrieval by supporting FAISS, Voyager, FlatNav & more. It auto-builds data structures, reuses vector stores to cut storage costs, and offers one familiar API for many retrievers.
17.12.2025 14:30 โ ๐ 0 ๐ 1 ๐ฌ 1 ๐ 0๐PyTerrier Advent 16/25: Speaking of Learned Sparse Retrieval, PyTerrier has bindings to two backend search engines that provide blazing-fast retrieval over sparse vectors: PISA and BMP.
You can see that we really work to keep the look-and-feel uniform between implementations
๐PyTerrier Advent 16/25: Speaking of Learned Sparse Retrieval, PyTerrier has bindings to two backend search engines that provide blazing-fast retrieval over sparse vectors: PISA and BMP.
You can see that we really work to keep the look-and-feel uniform between implementations
๐PyTerrier Advent 15/25: A very well-known learned sparse method is SPLADE. Our pyt_splade plugin makes it easy to use SPLADE by formulating Terrier indexing & retrieving pipelines that are composed with a SPLADE encoder, adding extra columns (e.g. query_toks).
Try it ๐ github.com/cmacdonald/p...
๐PyTerrier Advent 15/25: A very well-known learned sparse method is SPLADE. Our pyt_splade plugin makes it easy to use SPLADE by formulating Terrier indexing & retrieving pipelines that are composed with a SPLADE encoder, adding extra columns (e.g. query_toks).
Try it ๐ github.com/cmacdonald/p...
๐ PyTerrier Advent 14/25: So weโve seen sparse and dense retrieval in PyTerrier. Some folk recommend hybrid retrieval โ e.g. reciprocal rank fusion (RRF) of sparse and dense results. We have an easy pipeline component that combine two sets of results by RRF (w/ a pretty schematic by Sean MacAvaney)
14.12.2025 18:31 โ ๐ 0 ๐ 1 ๐ฌ 1 ๐ 0๐ PyTerrier Advent 10/25: Dense retrieval often improves with pseudo-relevance feedback (Rocchio-style).
In PyTerrier_DR itโs easy: encode query, retrieve docs, a transformer to mix doc vectors w/ the query vector, and then re-retrieve.
pyterrier.readthedocs.io/en/latest/ex...
๐ PyTerrier Advent 14/25: So weโve seen sparse and dense retrieval in PyTerrier. Some folk recommend hybrid retrieval โ e.g. reciprocal rank fusion (RRF) of sparse and dense results. We have an easy pipeline component that combine two sets of results by RRF (w/ a pretty schematic by Sean MacAvaney)
14.12.2025 18:31 โ ๐ 0 ๐ 1 ๐ฌ 1 ๐ 0๐ PyTerrier Advent 13/25: Doc2query expands docs with generated queries, but can hallucinate. Our ECIRโ23 paper Doc2query-- (aka "minus minus") filters generated queries using a cross-encoder before indexing.
PyTerrier pipeline: generateโscoreโfilterโindex.
๐https://arxiv.org/pdf/2301.03266
๐ PyTerrier Advent 13/25: Doc2query expands docs with generated queries, but can hallucinate. Our ECIRโ23 paper Doc2query-- (aka "minus minus") filters generated queries using a cross-encoder before indexing.
PyTerrier pipeline: generateโscoreโfilterโindex.
๐https://arxiv.org/pdf/2301.03266
๐ PyTerrier Advent 12/25: Beyond dense retrieval, learned sparse methods like Doc2Query expand docs with predicted queries before indexing. Our pyterrier_doc2query plugin makes this easy for any corpusโperfectly intuitive as PyTerrierโs pipelines can be applied at indexing time too!
12.12.2025 10:18 โ ๐ 0 ๐ 1 ๐ฌ 1 ๐ 0๐ PyTerrier Advent 12/25: Beyond dense retrieval, learned sparse methods like Doc2Query expand docs with predicted queries before indexing. Our pyterrier_doc2query plugin makes this easy for any corpusโperfectly intuitive as PyTerrierโs pipelines can be applied at indexing time too!
12.12.2025 10:18 โ ๐ 0 ๐ 1 ๐ฌ 1 ๐ 0๐PyTerrier Advent 11/25: Want to use an external search services with PyTerrier? No problemo! It has integrations with APIs for Semantic Scholar, ChatNoir (thanks to Jan Heinrich Merker!), Pinecone, and others!
11.12.2025 09:53 โ ๐ 1 ๐ 1 ๐ฌ 1 ๐ 0