Jingfeng Wu's Avatar

Jingfeng Wu

@uuujf.bsky.social

Postdoc at Simons at UC Berkeley; alumnus of Johns Hopkins & Peking University; deep learning theory. https://uuujf.github.io

59 Followers  |  218 Following  |  8 Posts  |  Joined: 18.03.2025  |  1.9583

Latest posts by uuujf.bsky.social on Bluesky

slides: uuujf.github.io/postdoc/wu20...

26.09.2025 03:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
GD dominates ridge

GD dominates ridge

sharing a new paper w/ Peter Bartlett, @jasondeanlee.bsky.social @shamkakade.bsky.social, Bin Yu

ppl talking about implicit regularization, but how good is it? We show it's surprisingly effective: GD dominates ridge for linear regression, w/ more cool stuff on GD vs SGD

arxiv.org/abs/2509.17251

26.09.2025 03:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image Post image

Lassen in August

01.09.2025 21:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
I’m an award-winning mathematician. Trump just cut my funding. The β€œMozart of Math” tried to stay out of politics. Then it came for his research.

I wrote an op-ed on the world-class STEM research ecosystem in the United States, and how this ecosystem is now under attack on multiple fronts by the current administration: newsletter.ofthebrave.org/p/im-an-awar...

18.08.2025 15:45 β€” πŸ‘ 754    πŸ” 307    πŸ’¬ 17    πŸ“Œ 30
Preview
More Than 50 Simons Foundation Grantees to Speak at 2026 International Congress of Mathematicians More Than 50 Simons Foundation Grantees to Speak at 2026 International Congress of Mathematicians on Simons Foundation

Congratulations to our colleague and friend, former Simons Institute Associate Director Peter Bartlett, who will be delivering one of the plenary lectures for the 2026 International Congress of Mathematicians.

www.simonsfoundation.org/2025/07/11/m...

15.08.2025 06:12 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Terence Tao (@tao@mathstodon.xyz) It is tempting to view the capability of current AI technology as a singular quantity: either a given task X is within the ability of current tools, or it is not. However, there is in fact a very wid...

My thoughts on the crucial importance of methodology on self-reported AI performance on mathematics competitions, and my policy on commenting on such reports going forward: mathstodon.xyz/@tao/1148814...

19.07.2025 22:37 β€” πŸ‘ 233    πŸ” 52    πŸ’¬ 2    πŸ“Œ 10
Post image Post image

πŸ“£Join us at COLT 2025 in Lyon for a community event!
πŸ“…When: Mon, June 30 | 16:00 CET
What: Fireside chat w/ Peter Bartlett & Vitaly Feldman on communicating a research agenda, followed by mentorship roundtable to practice elevator pitches & mingle w/ COLT community!
let-all.com/colt25.html

24.06.2025 18:22 β€” πŸ‘ 16    πŸ” 7    πŸ’¬ 0    πŸ“Œ 1
Preview
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression We study gradient descent (GD) with a constant stepsize for $\ell_2$-regularized logistic regression with linearly separable data. Classical theory suggests small stepsizes to ensure monotonic reducti...

2/2 For regularized logistic regression (strongly cvx and smooth) with separable data, we show GD, with simply a large stepsize, can match Nesterov’s acceleration, among other cool results.

arxiv.org/abs/2506.02336

04.06.2025 18:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes We study $\textit{gradient descent}$ (GD) for logistic regression on linearly separable data with stepsizes that adapt to the current risk, scaled by a constant hyperparameter $Ξ·$. We show that after ...

1/2 For the task of finding linear separator of a separable dataset with margin gamma, 1/gamma^2 steps suffice for adaptive GD with large stepsizes (applied to logistic loss). This is minimax optimal for first-order methods, and is impossible for GD with small stepsizes.

arxiv.org/abs/2504.04105

04.06.2025 18:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
effects of stepsize for GD

effects of stepsize for GD

Sharing two new papers on accelerating GD via large stepsizes!

Classical GD analysis assumes small stepsizes for stability. However, in practice, GD is often used with large stepsizes, which lead to instability.

See my slides for more details on this topic: uuujf.github.io/postdoc/wu20...

04.06.2025 18:55 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Jingfeng Wu, Pierre Marion, Peter Bartlett
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression
https://arxiv.org/abs/2506.02336

04.06.2025 05:26 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

Rocky mountain in May

19.05.2025 17:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025!

πŸ“ Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models!

πŸ—“οΈ Deadline: May 19, 2025

09.05.2025 17:09 β€” πŸ‘ 17    πŸ” 6    πŸ’¬ 1    πŸ“Œ 1
Post image

We were very lucky to have Peter Bartlett visit @uwcheritoncs.bsky.social and give a Distinguished Lecture on "Gradient Optimization Methods: The Benefits of a Large Step-size." Very interesting and surprising results.

(Recording will be available eventually)

07.05.2025 10:11 β€” πŸ‘ 26    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Preview
Tips on How to Connect at Academic Conferences I was a kinda awkward teenager. If you are a CS researcher reading this post, then chances are, you were too. How to navigate social situations and make friends is not always intuitive, and has to …

I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them

I'm late for #ICLR2025 #NAACL2025, but in time for #AISTATS2025 #ICML2025! 1/3
kamathematics.wordpress.com/2025/05/01/t...

01.05.2025 12:57 β€” πŸ‘ 68    πŸ” 19    πŸ’¬ 3    πŸ“Œ 2
Post image Post image Post image Post image

Yosemite in April

28.04.2025 17:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Ruiqi Zhang, Jingfeng Wu, Licong Lin, Peter L. Bartlett
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
https://arxiv.org/abs/2504.04105

08.04.2025 05:26 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
The Future of Language Models and Transformers Transformers have now been scaled to vast amounts of static data. This approach has been so successful it has forced the research community to ask, "What's next?". This workshop will bring together re...

Join us for a week of talks on The Future of Language Models and Transformers at the Simons Institute. Talks by @profsanjeevarora.bsky.social, Azalia Mirhoseini, Kilian Weinberger and others. Mon, March 31 - Fri, April 4.
simons.berkeley.edu/workshops/future-language-models-transformers

31.03.2025 16:22 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

@uuujf is following 20 prominent accounts