This is joint work with Shirui Chen, Iman Tanumihardja, Xiaochuang Han, Weijia Shi, Eric Shea-Brown, and Rajesh Rao. Please check out the preprint for more details.
Any feedback is appreciated! (9/9)
@lpjiang97.bsky.social
PhD student @uwcse.bsky.social interested in theoretical neuroscience. https://lpjiang97.github.io
This is joint work with Shirui Chen, Iman Tanumihardja, Xiaochuang Han, Weijia Shi, Eric Shea-Brown, and Rajesh Rao. Please check out the preprint for more details.
Any feedback is appreciated! (9/9)
Together, our results show that pretraining with more sessions does not naturally lead to improved downstream performance. We advocate for rigorous scaling analyses in future work on neural foundation models to account for data heterogeneity effects. (8/9)
16.05.2025 02:39 β π 0 π 0 π¬ 1 π 0We note that similar results have been found in NDT3 by Joel Ye, where several downstream datasets enjoyed little benefit from scale with 100 minutes of finetuning data. (7/9)
16.05.2025 02:39 β π 0 π 0 π¬ 1 π 0We found that models trained with as few as five top-ranked sessions outperformed those with randomly chosen sessions even when the full dataset was used, demonstrating the impact of session-to-session variability in performance scaling. (6/9)
16.05.2025 02:39 β π 1 π 0 π¬ 1 π 0For the forward-prediction task that did exhibit consistent scaling, we identified implicit data heterogeneity arising from cross-session variability. We proposed a session-selection procedure based on single-session finetuning performances. (5/9)
16.05.2025 02:39 β π 0 π 0 π¬ 1 π 0In this work, we systematically investigate how data heterogeneity impacts the scaling behavior of neural data transformer. We first found that brain region mismatches among sessions reduced scaling benefits of neuron-level and region-level activity prediction performances. (4/9)
16.05.2025 02:39 β π 1 π 0 π¬ 1 π 0Yet, previous studies typically lack fine-grained data scaling analyses. It remains unclear whether all sessions contribute equally to downstream performance gains. This is especially important to understand as pretraining scales to thousands of sessions and hours of data. (3/9)
16.05.2025 02:39 β π 0 π 0 π¬ 1 π 0Neural foundation models are gaining increasing attention these days, with the potential to learn cross-session/animal/species representations and benefit from multi-session pretraining. (2/9)
16.05.2025 02:39 β π 0 π 0 π¬ 1 π 0I'm excited to share my first π¦-print with our latest work β "Data Heterogeneity Limits the Scaling Effect of Pretraining in Neural Data Transformers", where we examined the effect of scaling up pretraining data in neural foundation models carefully.π§ (1/9)
Preprint:
www.biorxiv.org/content/10.1...
1. Today the NIH director issued a new directive slashing overhead rates to 15%.
I want to provide some context on what that means and why it matters.
grants.nih.gov/grants/guide...
New 716 hour EMG dataset just dropped β¨ ai.meta.com/blog/open-so...
05.12.2024 21:34 β π 59 π 21 π¬ 0 π 0OK If we are moving to Bluesky I am rescuing my favourite ever twitter thread (Jan 2019).
The renamed:
Bluesky-sized history of neuroscience (biased by my interests)