Satoki Ishikawa's Avatar

Satoki Ishikawa

@satoki-ishikawa.bsky.social

Institute of Science Tokyo / R. Yokota lab / Neural Network / Optimization Looking for great collabolation research https://riverstone496.github.io/

43 Followers  |  453 Following  |  71 Posts  |  Joined: 25.11.2024  |  1.7803

Latest posts by satoki-ishikawa.bsky.social on Bluesky

『確率的機械学習』の発売日,IBISが始まる二日前なのは合わせたように見える👀

23.10.2025 16:25 — 👍 0    🔁 0    💬 0    📌 0

My favorite pianist Erik Lu got the first prize at the Chopin Piano Competition😀

21.10.2025 00:52 — 👍 0    🔁 0    💬 0    📌 0

IBIS,発表は間に合わなかったのですが,参加はすることにしました😀懇親会はもちろん間に合わなかったのですが,裏懇親会に申し込みました.よろしくお願いします🙇

20.10.2025 06:28 — 👍 1    🔁 0    💬 0    📌 0

はい😇jaxをtorchのように使いたいということでは,torchaxに期待しているのですが🤔

19.10.2025 17:46 — 👍 0    🔁 0    💬 1    📌 0

1つのライブラリ(torch, trl)に慣れてしまうと,例え他のライブラリ(jax, verl)の方が適していたとしても,移行する際の心理的な障壁がとても高い.プログラミング言語間の翻訳に特化したLLMが一番欲しい…

19.10.2025 06:27 — 👍 2    🔁 0    💬 1    📌 0
Shipping with Codex
YouTube video by OpenAI Shipping with Codex

私も現状ではプロダクトを意識するレベルでは難しそうと思いつつ,Open AIの人たちが動画で1時間あたり約4000行のPRをテスト駆動開発(仮)で作っていると言っているのを見ると,設計書,テスト,検収条件などさえ正確に設計すれば,コードの細かい要件は考えずに,とにかくコードを書き続けるといったことも,そのうちにできるようになるのかなとも感じ,戦々恐々としています😱
www.youtube.com/watch?v=Gr41...

19.10.2025 06:24 — 👍 1    🔁 0    💬 1    📌 0
Preview
The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton Recent efforts to accelerate LLM pretraining have focused on computationally-efficient approximations that exploit second-order structure. This raises a key question for large-scale training: how much...

Practical upper-bound is an interesting concepts. What kind of practical upper bound would be interesting other than this?
arxiv.org/abs/2510.09378

15.10.2025 01:04 — 👍 1    🔁 0    💬 0    📌 0

Muon,論文にする場合は,早めに書かないと,また誰かと被る可能性がある一方で,何かあとひと押しのオリジナリティが出せない…

13.10.2025 11:52 — 👍 1    🔁 0    💬 0    📌 0
Preview
Jeremy Cohen on X: "This nice, thorough paper on LLM pretraining shows that quantization error rises sharply when the learning rate is decayed. But, why would that be? The answer is likely related to curvature dynamics. https://t.co/cdkt3DU1iw" / X This nice, thorough paper on LLM pretraining shows that quantization error rises sharply when the learning rate is decayed. But, why would that be? The answer is likely related to curvature dynamics. https://t.co/cdkt3DU1iw

While the focus for generalization and implicit bias has been on robustness to sample-wise noise, the rise of large-scale models suggests that robustness to parameter-wise noise (e.g., from quantization) might be now just as important?

x.com/deepcohen/st...

13.10.2025 05:40 — 👍 0    🔁 0    💬 0    📌 0

So many papers… it’s a bit overwhelming. Wish there were a field with fewer of them...

10.10.2025 13:12 — 👍 1    🔁 0    💬 0    📌 0

I’ve been challenging myself to read a lot of NeurIPS 2025 papers, but maybe I should switch soon to reading ICLR 2025 submissions instead.

10.10.2025 13:11 — 👍 1    🔁 0    💬 1    📌 0

This might be one of the advantages of methods that skip curvature EMA (like Muon) or use the function gradient (like NGD).

10.10.2025 12:46 — 👍 1    🔁 0    💬 0    📌 0

This paper is really interesting.
NGD builds curvature from the function gradient df/dw, while optimizers like Adam and Shampoo use the loss gradient dL/dw.
I’ve always wondered which is better, since using the loss gradient with EMA might cause loss spikes later in training.

10.10.2025 12:46 — 👍 2    🔁 0    💬 1    📌 0
Post image

This paper studies why Adam occasionally causes loss spikes, which is attributed to the edge of stability phenomenon. As seen from the figure, once hitting EOS (see b) a loss spike is triggered. An interesting experimental report!

arxiv.org/abs/2506.04805

10.10.2025 07:55 — 👍 5    🔁 1    💬 0    📌 0

I'm looking at ICLR submissions and I've noticed a significant number of papers related to Muon.

10.10.2025 03:42 — 👍 3    🔁 0    💬 0    📌 0

「スパース」という単語が関連する分野横断型のワークショップやると,お互いでどういう議論になるか興味がある...

08.10.2025 07:35 — 👍 0    🔁 0    💬 0    📌 0

学習ダイナミクスや暗黙的バイアスの観点から嬉しいスパース構造と,GPUを用いた行列積にとって嬉しいスパース構造と,脳が持っているスパース構造,どのくらいオーバーラップがあるのですかね🤔GPUを用いた行列積にとって嬉しいスパース行列の構造は複数パターン知られていますが,その学習理論や神経科学との接続はあまり聞かず,ただHPCの人も他分野に興味があるので,関連しそうな文献に引用は飛ばしつつも,あと一歩で行き詰まっている印象?

08.10.2025 07:29 — 👍 1    🔁 0    💬 1    📌 0
KEVIN CHEN – first round (19th Chopin Competition, Warsaw)
YouTube video by Chopin Institute KEVIN CHEN – first round (19th Chopin Competition, Warsaw)

And he is the genious.
www.youtube.com/watch?v=iZAp...

06.10.2025 19:38 — 👍 0    🔁 0    💬 0    📌 0
SHIORI KUWAHARA – first round (19th Chopin Competition, Warsaw)
YouTube video by Chopin Institute SHIORI KUWAHARA – first round (19th Chopin Competition, Warsaw)

Everyone is giving a wonderful performance, but among the Japanese pianists, I'm particularly fond of Ushida-kun and Kuwahara-san.

www.youtube.com/watch?v=SPS4...
www.youtube.com/watch?v=DaY6...

06.10.2025 19:31 — 👍 0    🔁 0    💬 1    📌 0
ERIC LU – first round (19th Chopin Competition, Warsaw)
YouTube video by Chopin Institute ERIC LU – first round (19th Chopin Competition, Warsaw)

I’m watching Chopin Competition.
I’m so surprised that he is using an office chair while I’m deeply impressed by his performance.

m.youtube.com/watch?v=fDsg...

06.10.2025 19:27 — 👍 0    🔁 0    💬 1    📌 1

二つの全く違うところでやっていた違うテーマ両方とオーバーラップが大きい研究が出てくると,精神的ダメージが大きいのですが,テーマ設定が安易すぎたのかもしれない

04.10.2025 01:21 — 👍 1    🔁 0    💬 0    📌 0

いいテーマがないか放浪する旅をします🧳

02.10.2025 18:40 — 👍 0    🔁 0    💬 0    📌 0

ここ2~3ヶ月考えていた2つのテーマに極めて近い論文が,昨日同時に2本出てしまい,研究テーマが消滅してしまった...早めに切り替えて,いい研究テーマをゼロから考え直さないといけないですね...

02.10.2025 18:06 — 👍 1    🔁 0    💬 1    📌 1

PINN,全然知らなかったのですが,最適化&Scalingの対象として面白そう.

28.09.2025 05:59 — 👍 0    🔁 0    💬 0    📌 0
Video thumbnail

Not all scaling laws are nice power laws. This month’s blog post: Zipf’s law in next-token prediction and why Adam (ok, sign descent) scales better to large vocab sizes than gradient descent: francisbach.com/scaling-laws...

27.09.2025 14:57 — 👍 47    🔁 12    💬 1    📌 0

I've made some small updates to the 'awesome list' for second-order optimization I made two years ago. It looks like Muon related works and the applications to PINNs have really taken off in the last couple of years.
github.com/riverstone49...

26.09.2025 12:18 — 👍 2    🔁 0    💬 1    📌 0

Fluid dynamics might serve as an interesting new benchmark for second-order optimization

23.09.2025 02:14 — 👍 2    🔁 0    💬 0    📌 0
Preview
Discovery of Unstable Singularities Whether singularities can form in fluids remains a foundational unanswered question in mathematics. This phenomenon occurs when solutions to governing equations, such as the 3D Euler equations, develo...

I don’t know anything about fluid dynamics, but I came across a paper that seemed to say that second-order optimization is key when using the power of neural networks to solve the Navier–Stokes equations. If so, there’s something romantic about that.
arxiv.org/abs/2509.14185

23.09.2025 02:13 — 👍 5    🔁 0    💬 1    📌 1

This is not OK.

I don't submit often to NeurIPS, but I reviewed papers for this conference almost every year. As a reviewer, why would I spend time trying to give a fair opinion on papers if it's what happens in the end???

20.09.2025 06:10 — 👍 51    🔁 11    💬 3    📌 1

ありがとうございます!

18.09.2025 10:50 — 👍 1    🔁 0    💬 0    📌 0

@satoki-ishikawa is following 20 prominent accounts