OpeNLGauge comes in two variants: a prompt-based ensemble and a smaller fine-tuned model, both built exclusively on open-weight LLMs (including training data!).
Thanks @tuetschek.bsky.social and @mlango.bsky.social!
23.08.2025 16:39 — 👍 1 🔁 0 💬 0 📌 0
We introduce an explainable metric for evaluating a wide range of natural language generation tasks, without any need for reference texts. Given an evaluation criterion, the metric provides fine-grained assessments of the output by highlighting and explaining problematic spans in the text.
23.08.2025 16:37 — 👍 0 🔁 0 💬 1 📌 0
Our paper "OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs" has been accepted to #INLG2025 conference!
You can read the preprint here: arxiv.org/abs/2503.11858
23.08.2025 16:36 — 👍 4 🔁 2 💬 1 📌 0
#ACL2025NLP in Vienna 🇦🇹 starts today with 23 🤯 @ufal-cuni.bsky.social folks presenting their work both at the main conference and workshops. Check out our main conference papers today and on Wednesday 👇
28.07.2025 07:27 — 👍 22 🔁 8 💬 1 📌 1
Today, @tuetschek.bsky.social shared the work of his team on evaluating LLM text generation with both human annotation frameworks and LLM-based metrics. Their approach tackles the benchmark data leakage problem and how to get unseen data for unbiased LLM testing.
30.04.2025 12:02 — 👍 8 🔁 3 💬 1 📌 0
Large Language Models as Span Annotators
Website for the paper Large Language Models as Span Annotators
How do LLMs compare to human crowdworkers in annotating text spans? 🧑🤖
And how can span annotation help us with evaluating texts?
Find out in our new paper: llm-span-annotators.github.io
Arxiv: arxiv.org/abs/2504.08697
15.04.2025 11:10 — 👍 20 🔁 7 💬 1 📌 2
We are a researcher community developing scientifically grounded research outputs and robust deployment infrastructure for broader impact evaluations.
https://evalevalai.com/
AI and Games Researcher at NYU. Head of AI at Nof1.
PhD student @uwnlp.bsky.social @uwcse.bsky.social | visiting researcher @MetaAI | previously @jhuclsp.bsky.social
https://stellalisy.com
AI PhDing at Mila/McGill (prev FAIR intern). Happily residing in Montreal 🥯❄️
Academic: language grounding, vision+language, interp, rigorous & creative evals, cogsci
Other: many sports, urban explorations, puzzles/quizzes
bennokrojer.com
PhD student @ Fraunhofer HHI. Interpretability, incremental NLP, and NLU. https://pkhdipraja.github.io/
it's a website (and a podcast, and a newsletter) about humans and technology, made by four journalists you might already know. like and subscribe: 404media.co
MaiNLP research lab at CIS, LMU Munich directed by Barbara Plank @barbaraplank.bsky.social
Natural Language Processing | Artificial Intelligence | Computational Linguistics | Human-centric NLP
The Milan Natural Language Processing Group #NLProc #AI
milanlproc.github.io
EurIPS is a community-organized, NeurIPS-endorsed conference in Copenhagen where you can present papers accepted at @neuripsconf.bsky.social
eurips.cc
PhD Student at @gronlp.bsky.social 🐮, core dev @inseq.org. Interpretability ∩ HCI ∩ #NLProc.
gsarti.com
Asst Prof at Johns Hopkins Cognitive Science • Director of the Group for Language and Intelligence (GLINT) ✨• Interested in all things language, cognition, and AI
jennhu.github.io
Groundbreaking foundational research in Big Data Management, Machine Learning, and their intersection. #AI #Research
www.bifold.berlin
📰News: www.bifold.berlin/news-events/news
🔑Data Privacy: www.bifold.berlin/data-privacy
Explainable AI research from the machine learning group of Prof. Klaus-Robert Müller at @tuberlin.bsky.social & @bifold.berlin
Second-year PhD student at XplaiNLP group @TU Berlin: interpretability & explainability
Website: https://qiaw99.github.io
http://linktr.ee/stevenbird
Working with First Nations people who are keeping their ancestral languages strong. Living and working on Larrakia, Bininj, and Miriwoong country. He/they.
Postdoc @milanlp.bsky.social working on LLM safety and societal impacts. Previously PhD @oii.ox.ac.uk and CTO / co-founder of Rewire (acquired '23)
https://paulrottger.com/
Asst Prof at Cornell Info Sci and Cornell Tech. Responsible AI
https://angelina-wang.github.io/