Weijie Su @wjsu - Bluesky Profile

Evaluating the Unseen Capabilities: How Many Theorems Do LLMs Know? Accurate evaluation of large language models (LLMs) is crucial for understanding their capabilities and guiding their development. However, current evaluations often inconsistently reflect the actual ...

Great minds think alike!

Alan Turing cracked Enigma in WWII; Brad Efron asked how many words Shakespeare knew. They used the same method.

We use this method for LLM evaluation—to evaluate certain unseen capabilities of LLMs:

arxiv.org/abs/2506.02058

04.06.2025 15:47 — 👍 1 🔁 1 💬 0 📌 0

Fundamental Limits of Game-Theoretic LLM Alignment: Smith Consistency and Preference Matching Nash Learning from Human Feedback is a game-theoretic framework for aligning large language models (LLMs) with human preferences by modeling learning as a two-player zero-sum game. However, using raw ...

Another new paper that is follow-up:

arxiv.org/abs/2505.20627

It studies an alternative to RLHF: Nash learning from human feedback.

30.05.2025 16:00 — 👍 0 🔁 1 💬 0 📌 0

Statistical Impossibility and Possibility of Aligning LLMs with Human Preferences: From Condorcet Paradox to Nash Equilibrium Aligning large language models (LLMs) with diverse human preferences is critical for ensuring fairness and informed outcomes when deploying these models for decision-making. In this paper, we seek to ...

A (not so) new paper on #LLM alignment from a social choice theory viewpoint:

arxiv.org/abs/2503.10990

It reveals fundamental impossibility results concerning representing (diverse) human preferences.

30.05.2025 15:59 — 👍 1 🔁 1 💬 1 📌 0

Our analysis shows that it is natural to use the polar decomposition from a defining viewpoint. This gives rise to nuclear norm scaling: the update will vanish as the gradient becomes small, automatically! In contrast, Muon needs to manually tune the factor for the ortho matrix to achieve this.

29.05.2025 17:13 — 👍 1 🔁 0 💬 0 📌 0

PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective The ever-growing scale of deep learning models and datasets underscores the critical importance of efficient optimization methods. While preconditioned gradient methods such as Adam and AdamW are the ...

We posted a paper on optimization for deep learning:

arxiv.org/abs/2505.21799

Recently there's a surge of interest in *structure-aware* optimizers: Muon, Shampoo, Soap. In this paper, we propose a unifying preconditioning perspective, offer insights into these matrix-gradient methods.

29.05.2025 17:12 — 👍 3 🔁 1 💬 1 📌 0

Statistical Foundations of Large Language Models

Some context: www.weijie-su.com/llm/

29.05.2025 13:17 — 👍 0 🔁 0 💬 0 📌 0

Do Large Language Models (Really) Need Statistical Foundations? Large language models (LLMs) represent a new paradigm for processing unstructured data, with applications across an unprecedented range of domains. In this paper, we address, through two arguments, wh...

I just wrote a position paper on the relation between statistics and large language models:

Do Large Language Models (Really) Need Statistical Foundations?

arxiv.org/abs/2505.19145

Any comments are welcome. Thx!

29.05.2025 13:16 — 👍 1 🔁 0 💬 1 📌 0

How to Prevent a Tragedy of the Commons for AI Research?

The ranking method was tested at ICML in 2023, 2024, and 2025. I hope we'll finally use it to improve ML/AI review processes soon. Here's an article about the method, from its conception to experimentation:

www.weijie-su.com/openrank/

27.05.2025 17:08 — 👍 1 🔁 0 💬 0 📌 0

The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML), asking authors with multiple submissions to rank their papers based on perceived q...

Our paper "The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review" will appear in JASA as a Discussion Paper:

arxiv.org/abs/2408.13430

It's a privilege to work with such a wonderful team: Buxin, Jiayao, Natalie, Yuling, Didong, Kyunghyun, Jianqing, and Aaroth.

27.05.2025 17:01 — 👍 1 🔁 1 💬 1 📌 0

Statistical Foundations of Large Language Models

We're hiring a postdoc focused on the statistical foundations of large language models, starting this fall. Join our team exploring the theoretical and statistical underpinnings of LLMs. If interested, check our work: weijie-su.com/llm/ and drop me an email. #AIResearch #PostdocPosition

13.05.2025 00:51 — 👍 1 🔁 1 💬 0 📌 0

Tips on How to Connect at Academic Conferences I was a kinda awkward teenager. If you are a CS researcher reading this post, then chances are, you were too. How to navigate social situations and make friends is not always intuitive, and has to …

I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them

I'm late for #ICLR2025 #NAACL2025, but in time for #AISTATS2025 #ICML2025! 1/3
kamathematics.wordpress.com/2025/05/01/t...

01.05.2025 12:57 — 👍 68 🔁 19 💬 3 📌 2

The #ICML2025 @icmlconf.bsky.social deadline has just passed!

Peer review is vital to advancing AI research. We've been conducting a survey experiment at ICML since 2023. Pls take a few minutes to participate in it, sent via email with the subject "[ICML 2025] Author Survey". Thx!

31.01.2025 16:04 — 👍 1 🔁 0 💬 0 📌 0

Stat Click on the title to browse this journal

A special issue on large language models (LLMs) and statistics at Stat (onlinelibrary.wiley.com/journal/2049...). We're seeking submissions examining LLMs' impact on statistical methods, practice, education, and many more @amstatnews.bsky.social

19.12.2024 10:25 — 👍 3 🔁 1 💬 0 📌 0

Departmental Postdoctoral Researcher Position

A departmental postdoc position opening in my dept: statistics.wharton.upenn.edu/recruiting/d...

11.12.2024 20:04 — 👍 4 🔁 0 💬 0 📌 0

Heading to Vancouver tomorrow for #NeurIPS2024, Dec 10-14! Excited to reconnect with colleagues and enjoy Vancouver's seafood! 🦐

09.12.2024 19:46 — 👍 1 🔁 0 💬 0 📌 0

Add me plz. Thx!

28.11.2024 01:29 — 👍 1 🔁 0 💬 1 📌 0

How Is AI Changing the Science of Prediction? Podcast Episode · The Joy of Why · 11/07/2024 · 37m

Machine learning has led to predictive algorithms so obscure that they resist analysis. Where does the field of traditional statistics fit into all of this? Emmanuel Candès asks the question, “Can I trust this?” Tune in to this week’s episode of “The Joy of Why” listen.quantamagazine.org/jow-321-s

07.11.2024 16:49 — 👍 32 🔁 8 💬 0 📌 0

Knew nothing about bluesky until today. Immediately stop using X or gradually migrate to bluesky? Is there an optimal switching strategy?

28.11.2024 01:22 — 👍 2 🔁 0 💬 0 📌 0

Weijie Su

Latest posts by wjsu.bsky.social on Bluesky

@wjsu is following 20 prominent accounts