Mila Gorecki's Avatar

Mila Gorecki

@milago.bsky.social

PhD student in Machine Learning @ MPI-IS Tübingen, Tübingen AI Center, IMPRS-IS

783 Followers  |  228 Following  |  4 Posts  |  Joined: 19.11.2024  |  1.5568

Latest posts by milago.bsky.social on Bluesky

Meet me at the Benchmarking workshop (sites.google.com/view/benchma...) at EurIPS on Saturday: We’ll present two works on errors in LLM-as-Judge and their impacts on benchmarking and test-time-scaling:

05.12.2025 08:57 — 👍 7    🔁 3    💬 1    📌 0
Post image

At #NeurIPS in San Diego this week? Interested in XAI, causality, or performative prediction? Come visit our poster!

💬 Performative Validity of Recourse Explanations
📆 Wednesday, 4.30 pm, Poster Session 2
w/ Hidde Fokkema, Timo Freiesleben, Celestine Mendler-Dünner, Ulrike von Luxburg

02.12.2025 18:17 — 👍 11    🔁 3    💬 0    📌 0
Post image

Attending #Neurips2025? Get your personalized Scholar Inbox conference program now to easily navigate the poster sessions and find what you are looking for:
www.scholar-inbox.com/conference/n...

02.12.2025 06:37 — 👍 34    🔁 12    💬 0    📌 0
Post image

I'll be @neuripsconf.bsky.social presenting Strategic Hypothesis Testing (spotlight!)

tldr: Many high-stakes decisions (e.g., drug approval) rely on p-values, but people submitting evidence respond strategically even w/o p-hacking. Can we characterize this behavior & how policy shapes it?

1/n

01.12.2025 20:31 — 👍 17    🔁 4    💬 1    📌 0

The empirical landscape sits between the two extremes. 

- Model similarity is high, yet disagreements let individuals find recourse by switching models. 

- Systemic exclusion is rare, yet more likely than under strong multiplicity. 

- Even in a single model, prompt variations induce multiplicity.

02.12.2025 15:57 — 👍 3    🔁 0    💬 0    📌 0

We evaluate 50 LLMs (various sizes & providers) across 6 tasks to assess how well each narrative fits the current LLM landscape, assuming that decision makers will increasingly rely on these models for consequential predictions.

02.12.2025 15:57 — 👍 1    🔁 0    💬 1    📌 0

There are two narratives about model ecosystems that grew out of the algorithmic fairness debate:

1. Monoculture: models converge toward homogeneity.

2. Multiplicity: many models solve tasks similarly but disagree on individual predictions, creating outcome variation.

02.12.2025 15:57 — 👍 0    🔁 0    💬 1    📌 0
Post image

Excited to be at #Neurips2025 this week to present our paper "Monoculture or Multiplicity: Which is it?", joint work with Moritz Hardt.

📄 Paper #1000: openreview.net/pdf?id=DO5Lt...
📍 Wed, Dec 3, 2025 • 4:30 PM – 7:30 PM

Feel free to come by and reach out!

A short 🧵.

02.12.2025 15:55 — 👍 16    🔁 4    💬 1    📌 0

@milago is following 20 prominent accounts