Ivan Kartáč's Avatar

Ivan Kartáč

@ivankartac.bsky.social

PhD student @ Charles University. Researching evaluation and explainability of reasoning in language models.

65 Followers  |  223 Following  |  3 Posts  |  Joined: 30.03.2025  |  1.5396

Latest posts by ivankartac.bsky.social on Bluesky

OpeNLGauge comes in two variants: a prompt-based ensemble and a smaller fine-tuned model, both built exclusively on open-weight LLMs (including training data!).

Thanks @tuetschek.bsky.social and @mlango.bsky.social!

23.08.2025 16:39 — 👍 1    🔁 0    💬 0    📌 0

We introduce an explainable metric for evaluating a wide range of natural language generation tasks, without any need for reference texts. Given an evaluation criterion, the metric provides fine-grained assessments of the output by highlighting and explaining problematic spans in the text.

23.08.2025 16:37 — 👍 0    🔁 0    💬 1    📌 0
Post image

Our paper "OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs" has been accepted to #INLG2025 conference!

You can read the preprint here: arxiv.org/abs/2503.11858

23.08.2025 16:36 — 👍 4    🔁 2    💬 1    📌 0
Post image Post image

#ACL2025NLP in Vienna 🇦🇹 starts today with 23 🤯 @ufal-cuni.bsky.social folks presenting their work both at the main conference and workshops. Check out our main conference papers today and on Wednesday 👇

28.07.2025 07:27 — 👍 22    🔁 8    💬 1    📌 1
Preview
Ondrej Dusek MLPrague 2025 Evaluating LLM outputs with humans and LLMs Ondřej Dušek MLPrague 30 April 2025 These slides: https://bit.ly/mlprague25-od

Slides and links to papers at bit.ly/mlprague25-od 🤓

02.05.2025 19:25 — 👍 2    🔁 2    💬 0    📌 0
Post image

Today, @tuetschek.bsky.social shared the work of his team on evaluating LLM text generation with both human annotation frameworks and LLM-based metrics. Their approach tackles the benchmark data leakage problem and how to get unseen data for unbiased LLM testing.

30.04.2025 12:02 — 👍 8    🔁 3    💬 1    📌 0
Preview
Large Language Models as Span Annotators Website for the paper Large Language Models as Span Annotators

How do LLMs compare to human crowdworkers in annotating text spans? 🧑🤖

And how can span annotation help us with evaluating texts?

Find out in our new paper: llm-span-annotators.github.io

Arxiv: arxiv.org/abs/2504.08697

15.04.2025 11:10 — 👍 20    🔁 7    💬 1    📌 2

@ivankartac is following 20 prominent accounts