's Avatar

@mrparryparry.bsky.social

Artist, Autist and Scientist Terminally Ranking Research @irglasgow.bsky.social https://parry-parry.github.io/

60 Followers  |  86 Following  |  6 Posts  |  Joined: 18.11.2024
Posts Following

Posts by (@mrparryparry.bsky.social)

Preview
Variations in Relevance Judgments and the Shelf Life of Test Collections The fundamental property of Cranfield-style evaluations, that system rankings are stable even when assessors disagree on individual relevance decisions, was validated on traditional test collections. ...

This work was devised at the ECIR collab-a-thon last year, and we hope to continue discussions at this year's collab-a-thon in Lucca! Read more here: arxiv.org/abs/2502.20937 #ECIR2025 #SIGIR2025

03.03.2025 10:17 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

We consider that a human represents a bound on performance under a subjective task such as determining relevance, as only a single intent is defined in each topic. We find that systems are either indistinguishable from humans or exceed humans as oracle rankers.

03.03.2025 10:17 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We then look downstream, what effect does re-annotation have on modern systems? We find that modern system comparisons are increasingly unstable on DL’19, meaning that determining the pair-wise ordering of systems when measured nDCG values are far apart remains unstable.

03.03.2025 10:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We look into causes of disagreement, finding that subtle differences in query intent, even when relevance is well defined, can lead to greater disagreement in 4-grade relevance. However, we find that it is challenging to agree on what is relevant even under a fixed narrative.

03.03.2025 10:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Re-annotation is commonly performed to validate how variations in relevance judgements affect our ability to discriminate between retrieval systems. We validate hypotheses on stability, but in a modern setting, there are no narratives, 4-grade relevance, and a neural pool.

03.03.2025 10:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

🚨 New Pre-Print! 🚨 Reviewer 2 has once again asked for DL’19, what can you say in rebuttal? Β To help, we have re-annotated DL’19. Work done with @maik_froebe.bsky.social, @hscells.bsky.social, @fschlatt1.bsky.social, Guglielmo Faggioli, Saber Zerhoudi, @macavaney.bsky.social, Eugene Yang 🧡

03.03.2025 10:17 β€” πŸ‘ 6    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0