slides: uuujf.github.io/postdoc/wu20...
26.09.2025 03:49 β π 0 π 0 π¬ 0 π 0@uuujf.bsky.social
Postdoc at Simons at UC Berkeley; alumnus of Johns Hopkins & Peking University; deep learning theory. https://uuujf.github.io
slides: uuujf.github.io/postdoc/wu20...
26.09.2025 03:49 β π 0 π 0 π¬ 0 π 0GD dominates ridge
sharing a new paper w/ Peter Bartlett, @jasondeanlee.bsky.social @shamkakade.bsky.social, Bin Yu
ppl talking about implicit regularization, but how good is it? We show it's surprisingly effective: GD dominates ridge for linear regression, w/ more cool stuff on GD vs SGD
arxiv.org/abs/2509.17251
Lassen in August
01.09.2025 21:03 β π 0 π 0 π¬ 0 π 0I wrote an op-ed on the world-class STEM research ecosystem in the United States, and how this ecosystem is now under attack on multiple fronts by the current administration: newsletter.ofthebrave.org/p/im-an-awar...
18.08.2025 15:45 β π 754 π 307 π¬ 17 π 30Congratulations to our colleague and friend, former Simons Institute Associate Director Peter Bartlett, who will be delivering one of the plenary lectures for the 2026 International Congress of Mathematicians.
www.simonsfoundation.org/2025/07/11/m...
My thoughts on the crucial importance of methodology on self-reported AI performance on mathematics competitions, and my policy on commenting on such reports going forward: mathstodon.xyz/@tao/1148814...
19.07.2025 22:37 β π 233 π 52 π¬ 2 π 10π£Join us at COLT 2025 in Lyon for a community event!
π
When: Mon, June 30 | 16:00 CET
What: Fireside chat w/ Peter Bartlett & Vitaly Feldman on communicating a research agenda, followed by mentorship roundtable to practice elevator pitches & mingle w/ COLT community!
let-all.com/colt25.html
2/2 For regularized logistic regression (strongly cvx and smooth) with separable data, we show GD, with simply a large stepsize, can match Nesterovβs acceleration, among other cool results.
arxiv.org/abs/2506.02336
1/2 For the task of finding linear separator of a separable dataset with margin gamma, 1/gamma^2 steps suffice for adaptive GD with large stepsizes (applied to logistic loss). This is minimax optimal for first-order methods, and is impossible for GD with small stepsizes.
arxiv.org/abs/2504.04105
effects of stepsize for GD
Sharing two new papers on accelerating GD via large stepsizes!
Classical GD analysis assumes small stepsizes for stability. However, in practice, GD is often used with large stepsizes, which lead to instability.
See my slides for more details on this topic: uuujf.github.io/postdoc/wu20...
Jingfeng Wu, Pierre Marion, Peter Bartlett
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression
https://arxiv.org/abs/2506.02336
Rocky mountain in May
19.05.2025 17:24 β π 0 π 0 π¬ 0 π 0Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025!
π Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models!
ποΈ Deadline: May 19, 2025
We were very lucky to have Peter Bartlett visit @uwcheritoncs.bsky.social and give a Distinguished Lecture on "Gradient Optimization Methods: The Benefits of a Large Step-size." Very interesting and surprising results.
(Recording will be available eventually)
I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them
I'm late for #ICLR2025 #NAACL2025, but in time for #AISTATS2025 #ICML2025! 1/3
kamathematics.wordpress.com/2025/05/01/t...
Yosemite in April
28.04.2025 17:36 β π 1 π 0 π¬ 0 π 0Ruiqi Zhang, Jingfeng Wu, Licong Lin, Peter L. Bartlett
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
https://arxiv.org/abs/2504.04105
Join us for a week of talks on The Future of Language Models and Transformers at the Simons Institute. Talks by @profsanjeevarora.bsky.social, Azalia Mirhoseini, Kilian Weinberger and others. Mon, March 31 - Fri, April 4.
simons.berkeley.edu/workshops/future-language-models-transformers