Yatong Chen

Yatong Chen

@yatongchen.bsky.social

Research group leader @ Max Planck Institute working on theory & social aspect of CS. Previous @UCSC@GoogleDeepMind @Stanford @PKU1898 https://yatongchen.github.io/

55 Followers 125 Following 7 Posts Joined Sep 2025
4 days ago
Post image

I’ll be giving a talk at @eth-ai-center.bsky.social tomorrow, March 10, at 11:30am on LLM benchmarking incentives.
Spoiler: today’s benchmarking incentives can produce unreliable model rankings, but we can fix that!

2 0 0 0
3 months ago

Meet me at the Benchmarking workshop (sites.google.com/view/benchma...) at EurIPS on Saturday: We’ll present two works on errors in LLM-as-Judge and their impacts on benchmarking and test-time-scaling:

7 3 1 0
3 months ago
Post image

Excited to be at #Neurips2025 this week to present our paper "Monoculture or Multiplicity: Which is it?", joint work with Moritz Hardt.

📄 Paper #1000: openreview.net/pdf?id=DO5Lt...
📍 Wed, Dec 3, 2025 • 4:30 PM – 7:30 PM

Feel free to come by and reach out!

A short 🧵.

16 4 1 0
3 months ago

Joint work w/ Safwan Hossain and Yiling Chen.

Paper link: arxiv.org/pdf/2508.03289

Come and drop by our poster session for more details!

4/4

2 0 0 0
3 months ago

Key takeaway: A statistical decision rule doesn't just make decisions; it determines who shows up to be evaluated.

3/n

1 0 1 0
3 months ago

We build a game-theoretic model of this interaction and show that there exists a computable critical p-value threshold α that cleanly structures false positives & negative errors. Empirically, it aligns with FDA data!

2/n

2 0 1 0
3 months ago
Post image

I'll be @neuripsconf.bsky.social presenting Strategic Hypothesis Testing (spotlight!)

tldr: Many high-stakes decisions (e.g., drug approval) rely on p-values, but people submitting evidence respond strategically even w/o p-hacking. Can we characterize this behavior & how policy shapes it?

1/n

17 4 1 0
5 months ago

We have an amazing line of keynote speakers: @iaugenstein.bsky.social, José Hernández-Orallo, @gaelvaroquaux.bsky.social, Laura Weidinger, and Emine Yilmaz.

Submit your work and join us in Copenhagen 🇩🇰!

2 0 0 0
5 months ago
Post image

We (w/ Moritz Hardt, Olawale Salaudeen and
@joavanschoren.bsky.social) are organizing the Workshop on the Science of Benchmarking & Evaluating AI @euripsconf.bsky.social 2025 in Copenhagen!

📢 Call for Posters: rb.gy/kyid4f
📅 Deadline: Oct 10, 2025 (AoE)
🔗 More info: rebrand.ly/bg931sf

21 7 1 0