Sara Rosenthal seirasto - Bluesky Statics

RAGAPHENE: A RAG Annotation Platform with Human Enhancements and Edits Retrieval Augmented Generation (RAG) is an important aspect of conversing with Large Language Models (LLMs) when factually correct information is important. LLMs may provide answers that appear correc...

📣📣Presenting our platform used to build MTRAG!!

RAGAPHENE: A RAG Annotation Platform with Human ENhancements and Edits

Arxiv: arxiv.org/abs/2508.19272
MTRAG GitHub: github.com/IBM/mt-rag-b...
Join our MTRAGEval Task: ibm.github.io/mt-rag-bench...

28.08.2025 12:47 — 👍 0 🔁 0 💬 0 📌 0

MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems Retrieval-augmented generation (RAG) has recently become a very popular task for Large Language Models (LLMs). Evaluating them on multi-turn RAG conversations, where the system is asked to generate a ...

🚀Excited to announce our MTRAGEval task at SemEval 2026!

Arxiv: arxiv.org/abs/2501.03468
Github: github.com/IBM/mt-rag-b... (please 🌟!)
MTRAGEval: ibm.github.io/mt-rag-bench...

04.08.2025 06:33 — 👍 0 🔁 0 💬 0 📌 0

InspectorRAGet: An Introspection Platform for RAG Evaluation Large Language Models (LLM) have become a popular approach for implementing Retrieval Augmented Generation (RAG) systems, and a significant amount of effort has been spent on building good models and ...

Working on RAG? Come check out our InspectorRAGet DEMO presented by Siva Sankalp Patel May 2 (Friday), 11-12:30 at Demo Session 8 in Hall 3! Looking forward to attending ACL in a few months! #NAACL2025 @naaclmeeting.bsky.social

paper: arxiv.org/abs/2404.17347
github: github.com/IBM/Inspecto...

01.05.2025 01:24 — 👍 2 🔁 0 💬 0 📌 0

Excited about this collab! Come check out FeeL and help advance multilingual generation in your language! huggingface.co/spaces/feel-...

26.03.2025 13:59 — 👍 2 🔁 1 💬 0 📌 0

How well can your RAG agent carry out a conversation? IBM’s new benchmark evaluates LLMs on interactive question-answering tasks using

🌟Want to know more about our MTRAG benchmark? Check out the IBM blog highlighting our work! research.ibm.com/blog/convers...

04.02.2025 20:41 — 👍 3 🔁 0 💬 0 📌 0

Retrievers (Elser shown here) struggle with later turns and non-standalone questions:

08.01.2025 20:09 — 👍 0 🔁 0 💬 0 📌 0

SOTA LLMs struggle with later turns and unanswerable questions:

08.01.2025 20:09 — 👍 0 🔁 0 💬 1 📌 0

Sample Conversation:

08.01.2025 20:09 — 👍 0 🔁 0 💬 1 📌 0

MTRAG is a challenging benchmark for SOTA LLMs and a great way to evaluate across multiple domains for Retrieval and Generation! MTRAG contains 110 conversations averaging 7.7 turns each across four domains for a total of 842 tasks. We also explore synthetic data and LLM-as-a-judge.

08.01.2025 20:09 — 👍 0 🔁 0 💬 1 📌 0

GitHub - IBM/mt-rag-benchmark: Multi-Turn RAG Benchmark Multi-Turn RAG Benchmark. Contribute to IBM/mt-rag-benchmark development by creating an account on GitHub.

🌟 New Benchmark! 🌟

Do you work on RAG? Are you interested in Multi-Turn conversations? Very excited to share the new MTRAG benchmark we have released!

Data: github.com/ibm/mt-rag-b...
Paper: arxiv.org/abs/2501.03468

08.01.2025 20:08 — 👍 6 🔁 4 💬 1 📌 0

Anyone else feel like Google scholar is missing citations lately? I have a recent paper that has 8 citations on semantic scholar and only 3 on Google scholar…. and I have two papers that are cited in one paper but only one has the citation 🤔

27.11.2024 01:47 — 👍 3 🔁 0 💬 0 📌 0

Please just message me on slack

25.11.2024 13:01 — 👍 1 🔁 0 💬 0 📌 0

Please add me. Thanks!

24.11.2024 14:33 — 👍 1 🔁 0 💬 0 📌 0

I did a starter pack of people in New York (City) working on ML/AI. Please distribute and feel free to self nominate!

go.bsky.app/BoEtagz

19.11.2024 01:38 — 👍 87 🔁 19 💬 42 📌 8

GitHub - IBM/InspectorRAGet: The repository contains generative AI analytics platform application code. The repository contains generative AI analytics platform application code. - IBM/InspectorRAGet

If you work on RAG check out InspectorRAGet - an awesome RAG tool for evaluation. Available on HuggingFace! We provide the interface, you provide the experiments and metrics. Want to know more? Just reach out!
github.com/IBM/Inspecto...
huggingface.co/spaces/kpfad...
arxiv.org/abs/2404.17347

22.11.2024 02:22 — 👍 5 🔁 0 💬 0 📌 0

Starter pack for IBM Research! Follow awesome IBM researchers! IBMers, let me know and I will add you! go.bsky.app/2SXcRmA

19.11.2024 13:13 — 👍 21 🔁 6 💬 3 📌 1

GitHub - primeqa/clapnq Contribute to primeqa/clapnq development by creating an account on GitHub.

Working on RAG? Check out our ClapNQ benchmark (accepted to TACL) to test the full RAG pipeline!

github.com/primeqa/clapnq
arxiv.org/abs/2404.02103

19.11.2024 02:49 — 👍 12 🔁 2 💬 1 📌 0

Please add me!

19.11.2024 02:44 — 👍 1 🔁 0 💬 0 📌 0

This is great! Please add me as well!

19.11.2024 02:42 — 👍 1 🔁 0 💬 0 📌 0

Posts by Sara Rosenthal (@seirasto.bsky.social)