Saw your work before, really cool work!
25.11.2025 17:59 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0@annhuang42.bsky.social
Comp Neuro, ML, Dynamical Systems ๐ง ๐คPhD student at Harvard & Kempner Institute. Prev at McGill, Mila, EPFL.
Saw your work before, really cool work!
25.11.2025 17:59 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0oh excellent pointer! That indeed matches our intuition
24.11.2025 20:04 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0And variance in the behavior is not necessarily coupled with that in the features; for example see this paper for a dissociation between the two openreview.net/forum?id=Yuc...
24.11.2025 19:56 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0What we found was that more consistent features during training does not guarantee more similar OOD behavior. In fact here stronger feature learning can lead to more variable OOD behavior, which we hypothesize was due to overfitting
24.11.2025 19:47 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Just looked at your paper, weโre basically motivated by the same question applied to different architectures! will try to visit your poster too
24.11.2025 19:37 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0yay thanks Dan!!
24.11.2025 19:32 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Thanks to my amazing collaborators and my PI! @satpreetsingh.bsky.social @flavioh.bsky.social @kanakarajanphd.bsky.social
๐นPaper: arxiv.org/pdf/2410.03972
๐นPoster: Fri Dec 5, Poster #2001 at Exhibition Hall C, D, E
Happy to chat at NeurIPS or by email at annhuang@g.harvard.edu!
Our results:
- support the contravariance principle (Cao & @dyamins.bsky.social)
- reveal when weight- & dynamic-level variability move together (or opposite)
- give "knobs" for controlling degeneracy, whether you're studying shared mechanisms or individual variability in task-trained RNNs.
4๏ธโฃ Regularization (L1, low-rank)
Both types of structural regularization reduce degeneracy across all levels. Regularization nudges networks toward more consistent, shared solutions.
3๏ธโฃ Network size
When we fix feature learning (using ยตP), larger RNNs converge to more consistent solutions at all levels โ weights, dynamics, and behavior.
A clean convergence-with-scale effect, demonstrated on RNNs across levels.
We then causally tested feature learningโs effect on degeneracy using ยตP scaling. Stronger feature learning reduces dynamical degeneracy & increases weight degeneracy (like harder tasks).
It also increases behavioral degeneracy under OOD inputs (likely due to overfitting).
2๏ธโฃ Feature learning
Complex tasks push RNNs into feature learning, where the network has to adapt its internal weights and features to solve the task. Weights travel much farther from initialization, leading to more dispersed weights in the weight space (higher degeneracy).
1๏ธโฃ Task complexity
As tasks get harder, we observe less degeneracy in dynamics/behavior, but more degeneracy in the weights.
When trained on harder tasks, RNNs converge to similar neural dynamics and OOD behavior, but their weight configurations diverge. Why?
Using 3,400 RNNs across 4 neuroscience-relevant tasks (flip-flop memory, working memory, pattern generation, path integration), we systematically varied:
- task complexity
- learning regime
- network size
- regularization
Our findings:
Our unified framework measures & controls degeneracy at 3 levels:
๐ฏ Behavior: variability in OOD performance
๐ง Dynamics: distance btwn neural trajectories, quantified by Dynamical Similarity Analysis
โ๏ธ Weights: permutation-invariant Frobenius distance btwn recurrent weights
RNNs trained from different seeds on the same task can show strikingly different internal solutions, even when they perform equally well. We call this solution degeneracy.
24.11.2025 16:43 โ ๐ 6 ๐ 0 ๐ฌ 1 ๐ 0๐Excited to share that our paper was selected as a Spotlight at #NeurIPS2025!
arxiv.org/pdf/2410.03972
It started from a question I kept running into:
When do RNNs trained on the same task converge/diverge in their solutions?
๐งตโฌ๏ธ
Our next paper on comparing dynamical systems (with special interest to artificial and biological neural networks) is out!! Joint work with @annhuang42.bsky.social , as well as @satpreetsingh.bsky.social , @leokoz8.bsky.social , Ila Fiete, and @kanakarajanphd.bsky.social : arxiv.org/pdf/2510.25943
10.11.2025 16:16 โ ๐ 67 ๐ 23 ๐ฌ 4 ๐ 3