Manfred Diaz's Avatar

Manfred Diaz

@manfreddiaz.bsky.social

Ph.D. Candidate at Mila and the University of Montreal, interested in AI/ML connections with economics, game theory, and social choice theory. https://manfreddiaz.github.io

2,373 Followers  |  757 Following  |  42 Posts  |  Joined: 29.10.2024
Posts Following

Posts by Manfred Diaz (@manfreddiaz.bsky.social)

belated happy birthday, Marc!

26.01.2026 04:46 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Hello all! πŸ‘‹

I’m delighted to share a 🚨 new preprint 🚨:

β€œActive Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithms”.

A paper thread! πŸ€©πŸ“„πŸ§΅ 1/N

15.01.2026 12:49 β€” πŸ‘ 47    πŸ” 11    πŸ’¬ 2    πŸ“Œ 2

Merry Christmas! β˜ƒοΈπŸŒ²

25.12.2025 06:48 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Maybe the general intelligence has always been behind the algorithm or the prompt? No publicly available eval seems to be safe from researchers overfitting.

29.11.2025 05:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

It hasn't disappointed thus far!

04.10.2025 05:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@sharky6000.bsky.social this may be of interest!

08.08.2025 09:07 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I was following this one during the COVID pandemic, but it has been inactive for quite some time. The original talks' recordings are amazing, though!

16.06.2025 14:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Yeah, it's been a period for all of us simultaneously! I have also been pretty busy with thesis/job search. Hopefully, it will be back running in the Fall term!

05.06.2025 15:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@aamasconf.bsky.social 2025 was very special for us! We had the opportunity. to present a tutorial on general evaluation of AI agents, and we got a best paper award! Congrats, @sharky6000.bsky.social and the team! πŸŽ‰

23.05.2025 14:23 β€” πŸ‘ 13    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
A Tutorial on General Evaluation of AI Agents Artificial Intelligence (AI) and machine learning (ML), in particular, have emerged as scientific disciplines concerned with understanding and building single and multi-agent systems with the ability ...

In the afternoon we will be giving a tutorial on general evaluation of AI agents.

sites.google.com/view/aamas20... 10/N

18.05.2025 17:33 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt Artificial Intelligence (AI) systems are increasingly placed in positions where their decisions have real consequences, e.g., moderating online spaces, conducting research, and advising on policy. Ens...

Announcing our latest arxiv paper:

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt
arxiv.org/abs/2505.05197

We argue for a view of AI safety centered on preventing disagreement from spiraling into conflict.

09.05.2025 11:39 β€” πŸ‘ 24    πŸ” 6    πŸ’¬ 1    πŸ“Œ 1

Congrats, Seth!

01.05.2025 22:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt β€” LessWrong We can just drop the axiom of rational convergence.

First LessWrong post! Inspired by Richard Rorty, we argue for a different view of AI alignment, where the goal is "more like sewing together a very large, elaborate, polychrome quilt", than it is "like getting a clearer vision of something true and deep"
www.lesswrong.com/posts/S8KYwt...

22.04.2025 15:14 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 3    πŸ“Œ 0

The quality of London's museums is just amazing! Enjoy!

16.04.2025 01:50 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
A Theory of Appropriateness with Applications to Generative Artificial Intelligence
YouTube video by MITCBMM A Theory of Appropriateness with Applications to Generative Artificial Intelligence

In case folks are interested, here's a video of a talk I gave at MIT a couple weeks ago: youtu.be/FmN6fRyfcsY?...

01.04.2025 20:50 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Our new evaluation method, Soft Condorcet Optimization is now available open-source! πŸ‘

Both the sigmoid (smooth Kendall-tau) and Fenchel-Young (perturbed optimizers) versions.

Also, an optimized C++ implementation that is ~40X faster than the Python one. 🀩⚑

github.com/google-deepm...

28.03.2025 09:45 β€” πŸ‘ 16    πŸ” 3    πŸ’¬ 0    πŸ“Œ 1
SCaLA-25 A workshop connecting research topics in social choice and learning algorithms.

Working at the intersection of social choice and learning algorithms?

Check out the 2nd Workshop on Social Choice and Learning Algorithms (SCaLA) at @ijcai.bsky.social this summer.

Submission deadline: May 9th.

I attended last year at AAMAS and loved it! πŸ‘

sites.google.com/corp/view/sc...

26.03.2025 20:18 β€” πŸ‘ 19    πŸ” 6    πŸ’¬ 0    πŸ“Œ 2

If the AAMAS website is a good reference for this, it may not be, but uncertain atm.

06.03.2025 05:34 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Come to understand ML evaluation from first principles! We have put together a great AAMAS tutorial covering statistics, probabilistic models, game theory, and social choice theory.

Bonus: a unifying perspective of the problem leveraging decision-theoretic principles!

Join us on May 19th!

04.03.2025 23:39 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Re #2: The key finding there is that the stationary points of SCO contain the margin matrix but, as I said in the note, there is still more work to do!

04.03.2025 19:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Thanks! I have been meaning to update the manuscript to standalone without the main paper but instead I may have change the content to a different format πŸ˜‰. Coming soon!

04.03.2025 19:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Ah, I see the confusion... I never used the "identically distributed assumption," only the independence assumption (from 8 to 9).

25.02.2025 19:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I'm not sure if I understood your question correctly, but yes? As the post you shared says, "Voila! We have shown that minimizing the KL divergence amounts to finding the maximum likelihood estimate of θ." Maybe I am missing your point 😬

25.02.2025 19:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Elo drives most LLM evaluations, but we often overlook its assumptions, benefits, and limitations. While working on SCO, we wanted to understand the SCO-Elo distinction, so I looked and uncovered some intriguing findings and documented them in these notes. I hope you find them valuable!

25.02.2025 02:29 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Looking for a principled evaluation method for ranking of *general* agents or models, i.e. that get evaluated across a myriad of different tasks?

I’m delighted to tell you about our new paper, Soft Condorcet Optimization (SCO) for Ranking of General Agents, to be presented at AAMAS 2025! 🧡 1/N

24.02.2025 15:25 β€” πŸ‘ 66    πŸ” 17    πŸ’¬ 1    πŸ“Œ 6

I had the convexity results for the online pairwise update (Section B.1.1.1) in my notes (manfreddiaz.github.io/assets/pdf/s...), but it is not clear to me if they hold for the other non-online settings. Worth taking a more detailed pass over the paper!

20.02.2025 20:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

That's a nice finding, @sacha2.bsky.social! @sharky6000.bsky.social I skimmed over it, and it seems neat! There is an important distinction, though. They work with the "online" Elo regime, departing from the traditional gradient/batch gradient descent updates. (e.g., FIDE doesn't use online updates)

20.02.2025 20:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

lol πŸ˜€

12.02.2025 20:31 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Michael I. Jordan - Wikipedia

Not that Michael Jordan, but this one en.wikipedia.org/wiki/Michael...

12.02.2025 20:29 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I believe this example conveys, as Prof. Jordan hinted, the need for fresh conceptual frameworks that shift our perspective, help us avoid conceptual confusion, and increase our ability to build the future of AI. I believe ML-SoA provides such framework, but I’d love to hear more perspectives!

11.02.2025 20:57 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0