Aaron Roth's Avatar

Aaron Roth

@aaroth.bsky.social

Professor at Penn, Amazon Scholar at AWS. Interested in machine learning, uncertainty quantification, game theory, privacy, fairness, and most of the intersections therein

4,259 Followers  |  388 Following  |  391 Posts  |  Joined: 20.10.2023  |  2.1369

Latest posts by aaroth.bsky.social on Bluesky

Post image

Did you just miss punching your ticket to Rio or Salt Lake City? Wanna go to a conference where people will engage with you and your paper on foundations of responsible computing, and you won't get lost in the crowd?

Submit to #FORC2026, in Boston in June! Deadline in 2 weeks.

02.02.2026 18:16 β€” πŸ‘ 3    πŸ” 3    πŸ’¬ 0    πŸ“Œ 1

The other paper accepted to @iclr-conf.bsky.social 2026 πŸ‡§πŸ‡·. Our work on replicable RL sheds some light on how to consistently make decisions in RL.

@ericeaton.bsky.social @mkearnsphilly.bsky.social @aaroth.bsky.social @sikatasengupta.bsky.social @optimistsinc.bsky.social

26.01.2026 16:08 β€” πŸ‘ 12    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0

I try to avoid posting about politics here, but I feel compelled to say some things that should be obvious: 🧡

25.01.2026 17:43 β€” πŸ‘ 17    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Preview
History of the U.S. National Science Foundation (NSF)

The NSF has played a key role in American science, and risks being collateral damage in the war against science.
#econsky #academicsky #NSF #science
marketdesigner.blogspot.com/2026/01/hist...

12.01.2026 13:55 β€” πŸ‘ 20    πŸ” 10    πŸ’¬ 0    πŸ“Œ 1
Post image Post image Post image

The paper is here: arxiv.org/abs/2601.05245 Its joint work with @ncollina.bsky.social, Jiuyao Lu, and George Noarov. Natalie and George are on the job market --- check them out. www.seas.upenn.edu/~ncollina/ noarov.com

09.01.2026 13:21 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Gemini provides automated feedback for theoretical computer scientists at STOC 2026

Not sure of the details, but I believe its related to the experiment that STOC ran giving feedback with a version of Gemini Deep Think which got generally postiive reviews for critiquing math research.google/blog/gemini-...

09.01.2026 15:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Whats wrong with providing access to a fancy LLM to give feedback to authors about their own papers?

09.01.2026 14:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

But we ended up showing that this is impossible in generality. The results in the paper also lay out a slightly more nuanced landscape, and there remain some interesting open questions about the power of reductions from multicalibration to marginal calibration. Take a look!

09.01.2026 13:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This was a fun project in part because I didn't know what the right answer was. I started out believing that there should be a rate preserving reduction from multicalibration to marginal calibration, lifting the (unknown) minimax calibration rates to multicalibration.

09.01.2026 13:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Informally its because you can define instances and groups/subsequences that punish the learner for any deviation from honest forecasting. The honest strategy works broadly; anything that deviates from it necessarily "overfits" to the weak marginal calibration metric.

09.01.2026 13:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

It could have been that the minimax rates for the two problems were identical, up to a ~ logarithmic term in the number of subsequences, which is what the upper bounds pay. What we show is that they are fundamentally different --- you can't beat the "honest" T^{2/3} rate.

09.01.2026 13:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

What about for multicalibration? The same kinds of techniques that get T^{2/3} rates for calibration also work for multicalibration --- Blackwell Approachability, multiobjective optimization, etc. Morally this is because the "honest" strategy also gets multicalibration.

09.01.2026 13:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Breaking the $T^{2/3}$ Barrier for Sequential Calibration A set of probabilistic forecasts is calibrated if each prediction of the forecaster closely approximates the empirical distribution of outcomes on the subset of timesteps where that prediction was mad...

It is much less clear that there are strategies that let you do this profitably against a worst case adversary --- but thats exactly what Dagan et al. showed recently to establish O(T^{2/3}-eps) rates for marginal calibration arxiv.org/abs/2406.13668 --- that was super surprising.

09.01.2026 13:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Thinking about truthful forecasting is what gets you T^{2/3} rates. But maybe you could do better - by cleverly strategizing to arrange for cancellations of the random noise with intentional bias that you inject. Its easy to see that you can do this on particular sequences.

09.01.2026 13:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Suppose you knew the chance of rain. One strategy is just forecast it truthfully. The bias of your predictions would be 0, but there would be noise: sometimes when you predict a 70% chance of rain it doesn't rain. The noise is higher the less frequently you make a prediction.

09.01.2026 13:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Calibration asks that forecasts behave like probabilities marginally over a sequence. Amongst all the days I predict a 70% chance of rain, it should rain 70% of the time, etc. Multicalibration asks for the same guarantee simultaneously on many pre-defined subsequences.

09.01.2026 13:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image

The paper is here: arxiv.org/abs/2601.05245 Its joint work with @ncollina.bsky.social, Jiuyao Lu, and George Noarov. Natalie and George are on the job market --- check them out. www.seas.upenn.edu/~ncollina/ noarov.com

09.01.2026 13:21 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Excited about a new paper! Multicalibration turns out to be strictly harder than marginal calibration. We prove tight Omega(T^{2/3}) lower bounds for online multicalibration, separating it from online marginal calibration for which better rates were recently discovered.

09.01.2026 13:21 β€” πŸ‘ 21    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

AI assisted papers are very good at form. They are written in the voice of an experienced researcher, and so evade our old heuristics. We need to learn a new set of red flags. These include citation errors, vague gesturing to standard results, and other things that we will learn from experience.

30.12.2025 15:01 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yes. We already have a set of ingrained red flags for human written papers that signal a lack of care: not citing the relevant literature, not formatting or typesetting math correctly, etc. These don't mean the paper is wrong but they strongly correlate with lack of care. But...

30.12.2025 15:01 β€” πŸ‘ 15    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Getting absurd over at the ACM…

29.12.2025 00:05 β€” πŸ‘ 12    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
Preview
Gemini provides automated feedback for theoretical computer scientists at STOC 2026

STOC ran an experiment in which authors were able to use a Gemini model to check papers for mathematical errors before submission. It got positive feedback: research.google/blog/gemini-... - it is quite good at catching mathematical errors. Obv not a replacement for peer review but a useful tool.

29.12.2025 00:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

So, many things will change --- I'm convinced that AI will be transformative for mathematical research. I think the changes will go beyond the day-to-day, and will extend to how we train our students and how we disseminate our work. The future is exciting and uncertain.

21.12.2025 19:01 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

And we are already seeing that reducing the time and effort needed to produce "a paper" (not a -good- paper) is going to destabilize our existing institutions for peer review. We need to figure out how to manage researcher attention at scale and not be drowned in research slop.

21.12.2025 19:01 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

A world in which clever discoveries happen in data centers, and the role of the professional researcher is careful verification and due diligence is a world in which the job of researcher is much less fun. Many fewer people with choices would want this job, given the other costs.

21.12.2025 19:01 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I also worry about removing joy. The academic bargain is that extremely talented researchers forgo high industry pay and location freedom in exchange for a really -fun- job --- solving research puzzles for a living. The joy of the work is an important part of the bargain.

21.12.2025 19:01 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

(Current) AI tools are much less useful without the right intuition about what can and cannot work. When they are able to lure you away from your expertise, they can easily tempt you to believe that fatally flawed constructions can be made to work. Formal verification would help.

21.12.2025 19:01 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

But AI tools are now very good at the "standard" calculations and proof techniques I struggled through. We will need to be more intentional about teaching these skills to young researchers, just as we teach arithmetic in school without the use of calculators.

21.12.2025 19:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We need to figure out how to educate the next generation of researchers. I feel I have strong research intuition that is complementary to the strengths of AI tools. But how did I get it? I think I developed it by struggling through calculations and failed proofs in grad school.

21.12.2025 19:01 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Nevertheless, the last 8 months have changed how I do research much more fundamentally than any other time during my career. This makes it very difficult to project forward. What will research at the end of 2026 look like? 2027? All I can confidently predict is "different".

21.12.2025 19:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@aaroth is following 19 prominent accounts