Sara Rosenthal's Avatar

Sara Rosenthal

@seirasto.bsky.social

NLP Research Scientist at IBM Research

1,279 Followers  |  339 Following  |  18 Posts  |  Joined: 19.11.2024
Posts Following

Posts by Sara Rosenthal (@seirasto.bsky.social)

Preview
RAGAPHENE: A RAG Annotation Platform with Human Enhancements and Edits Retrieval Augmented Generation (RAG) is an important aspect of conversing with Large Language Models (LLMs) when factually correct information is important. LLMs may provide answers that appear correc...

πŸ“£πŸ“£Presenting our platform used to build MTRAG!!

RAGAPHENE: A RAG Annotation Platform with Human ENhancements and Edits

Arxiv: arxiv.org/abs/2508.19272
MTRAG GitHub: github.com/IBM/mt-rag-b...
Join our MTRAGEval Task: ibm.github.io/mt-rag-bench...

28.08.2025 12:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems Retrieval-augmented generation (RAG) has recently become a very popular task for Large Language Models (LLMs). Evaluating them on multi-turn RAG conversations, where the system is asked to generate a ...

πŸš€Excited to announce our MTRAGEval task at SemEval 2026!

Arxiv: arxiv.org/abs/2501.03468
Github: github.com/IBM/mt-rag-b... (please 🌟!)
MTRAGEval: ibm.github.io/mt-rag-bench...

04.08.2025 06:33 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
InspectorRAGet: An Introspection Platform for RAG Evaluation Large Language Models (LLM) have become a popular approach for implementing Retrieval Augmented Generation (RAG) systems, and a significant amount of effort has been spent on building good models and ...

Working on RAG? Come check out our InspectorRAGet DEMO presented by Siva Sankalp Patel May 2 (Friday), 11-12:30 at Demo Session 8 in Hall 3! Looking forward to attending ACL in a few months! #NAACL2025 @naaclmeeting.bsky.social

paper: arxiv.org/abs/2404.17347
github: github.com/IBM/Inspecto...

01.05.2025 01:24 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Excited about this collab! Come check out FeeL and help advance multilingual generation in your language! huggingface.co/spaces/feel-...

26.03.2025 13:59 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
How well can your RAG agent carry out a conversation? IBM’s new benchmark evaluates LLMs on interactive question-answering tasks using

🌟Want to know more about our MTRAG benchmark? Check out the IBM blog highlighting our work! research.ibm.com/blog/convers...

04.02.2025 20:41 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Retrievers (Elser shown here) struggle with later turns and non-standalone questions:

08.01.2025 20:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

SOTA LLMs struggle with later turns and unanswerable questions:

08.01.2025 20:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Sample Conversation:

08.01.2025 20:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

MTRAG is a challenging benchmark for SOTA LLMs and a great way to evaluate across multiple domains for Retrieval and Generation! MTRAG contains 110 conversations averaging 7.7 turns each across four domains for a total of 842 tasks. We also explore synthetic data and LLM-as-a-judge.

08.01.2025 20:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - IBM/mt-rag-benchmark: Multi-Turn RAG Benchmark Multi-Turn RAG Benchmark. Contribute to IBM/mt-rag-benchmark development by creating an account on GitHub.

🌟 New Benchmark! 🌟

Do you work on RAG? Are you interested in Multi-Turn conversations? Very excited to share the new MTRAG benchmark we have released!

Data: github.com/ibm/mt-rag-b...
Paper: arxiv.org/abs/2501.03468

08.01.2025 20:08 β€” πŸ‘ 6    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

Anyone else feel like Google scholar is missing citations lately? I have a recent paper that has 8 citations on semantic scholar and only 3 on Google scholar…. and I have two papers that are cited in one paper but only one has the citation πŸ€”

27.11.2024 01:47 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Please just message me on slack

25.11.2024 13:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Please add me. Thanks!

24.11.2024 14:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I did a starter pack of people in New York (City) working on ML/AI. Please distribute and feel free to self nominate!

go.bsky.app/BoEtagz

19.11.2024 01:38 β€” πŸ‘ 87    πŸ” 19    πŸ’¬ 42    πŸ“Œ 8
Preview
GitHub - IBM/InspectorRAGet: The repository contains generative AI analytics platform application code. The repository contains generative AI analytics platform application code. - IBM/InspectorRAGet

If you work on RAG check out InspectorRAGet - an awesome RAG tool for evaluation. Available on HuggingFace! We provide the interface, you provide the experiments and metrics. Want to know more? Just reach out!
github.com/IBM/Inspecto...
huggingface.co/spaces/kpfad...
arxiv.org/abs/2404.17347

22.11.2024 02:22 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Starter pack for IBM Research! Follow awesome IBM researchers! IBMers, let me know and I will add you! go.bsky.app/2SXcRmA

19.11.2024 13:13 β€” πŸ‘ 21    πŸ” 6    πŸ’¬ 3    πŸ“Œ 1
Preview
GitHub - primeqa/clapnq Contribute to primeqa/clapnq development by creating an account on GitHub.

Working on RAG? Check out our ClapNQ benchmark (accepted to TACL) to test the full RAG pipeline!

github.com/primeqa/clapnq
arxiv.org/abs/2404.02103

19.11.2024 02:49 β€” πŸ‘ 12    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Please add me!

19.11.2024 02:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This is great! Please add me as well!

19.11.2024 02:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0