Really enjoyed chatting with @oxfordmathematics.bsky.social
about AI in mathematics, where it can genuinely help, and what some of the limitations are.
@benjamincwalker.bsky.social
π Machine Learning PhD π Mathematical Institute, Oxford π Researching Neural Differential Equations & Rough Path Theory π§ Email: MLBenjaminWalker@gmail.com π GitHub: Benjamin-Walker
Really enjoyed chatting with @oxfordmathematics.bsky.social
about AI in mathematics, where it can genuinely help, and what some of the limitations are.
Alternative Title: βA Timely Series of Talks on Time Series.β
Was too proud of this one so had to post it somewhere!
Just wrapped up my short course βTime Series Modelling: From Foundations to Frontiersβ at the Oxford Internet Institute @oii.ox.ac.uk
Huge thanks to @ammaox.bsky.social for the invitationβhad some really engaging discussions!
Looking forward to being back at the OII soon.
Sonnet-3.7 has me vibe coding for the first time π§π»
Never written html, css, or JavaScript before, but Iβve created the website Iβve always wanted, featuring an optional command line interface β¨
BenWalker.co.uk
Looking forward to presenting this work at #NeurIPS2024 !
Come find us on Thursday from 11-2 @ West Ballroom A-D #6907
D&D Combinatorics xkcd.com/3015
23.11.2024 00:59 β π 26879 π 2156 π¬ 201 π 109Huge thanks to my incredible co-authors Nicola Cirone, Antonio Orvieto, Cristopher Salvi, and Terry Lyons!
#NeurIPS2024 #MachineLearning #DeepLearning #StateSpaceModels
π§΅6/6
S4, Mamba, and Transformers need 4 blocks just to compose 12 permutations!
In contrast, using a dense state-transition matrix (IDS4/Linear CDE) or a non-linear state-transition (RNN) allows for state-tracking with only 1 layer.
π§΅5/6
An excellent empirical example of this limited capacity is the A5 benchmark, from βThe Illusion of State in State-Space Modelsβ by Merrill et al.
The benchmark tests state-tracking, a crucial ability for tasks involving permutation composition like chess.
The results? π
π§΅4/6
We rigorously show that Mambaβs selectivity mechanism boosts expressiveness.
However, we also show that using a diagonal state-transition matrixβwhile drastically reducing computational costsβalso significantly limits the model's capacity.
π§΅3/6
In this paper, we introduce a unified framework for state-space models using Rough Path Theory, providing a rigorous theoretical foundation for why the Mamba recurrence outperforms other SSMsβand precisely where their expressiveness may be limited.
π§΅2/6
Want to know why Mamba beats other state-space modelsβand where it falls short?
Then check out our #NeurIPS 2024 paper: "Theoretical Foundations of Deep Selective State-Space Models."
π Read the paper: arxiv.org/abs/2402.19047
π» Access the code: github.com/Benjamin-Walkeβ¦
π§΅1/6