Linxing Preston Jiang's Avatar

Linxing Preston Jiang

@lpjiang97.bsky.social

PhD student @uwcse.bsky.social interested in theoretical neuroscience. https://lpjiang97.github.io

67 Followers  |  48 Following  |  9 Posts  |  Joined: 26.11.2024  |  1.4644

Latest posts by lpjiang97.bsky.social on Bluesky

This is joint work with Shirui Chen, Iman Tanumihardja, Xiaochuang Han, Weijia Shi, Eric Shea-Brown, and Rajesh Rao. Please check out the preprint for more details.

Any feedback is appreciated! (9/9)

16.05.2025 02:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Together, our results show that pretraining with more sessions does not naturally lead to improved downstream performance. We advocate for rigorous scaling analyses in future work on neural foundation models to account for data heterogeneity effects. (8/9)

16.05.2025 02:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We note that similar results have been found in NDT3 by Joel Ye, where several downstream datasets enjoyed little benefit from scale with 100 minutes of finetuning data. (7/9)

16.05.2025 02:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We found that models trained with as few as five top-ranked sessions outperformed those with randomly chosen sessions even when the full dataset was used, demonstrating the impact of session-to-session variability in performance scaling. (6/9)

16.05.2025 02:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

For the forward-prediction task that did exhibit consistent scaling, we identified implicit data heterogeneity arising from cross-session variability. We proposed a session-selection procedure based on single-session finetuning performances. (5/9)

16.05.2025 02:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In this work, we systematically investigate how data heterogeneity impacts the scaling behavior of neural data transformer. We first found that brain region mismatches among sessions reduced scaling benefits of neuron-level and region-level activity prediction performances. (4/9)

16.05.2025 02:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Yet, previous studies typically lack fine-grained data scaling analyses. It remains unclear whether all sessions contribute equally to downstream performance gains. This is especially important to understand as pretraining scales to thousands of sessions and hours of data. (3/9)

16.05.2025 02:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Neural foundation models are gaining increasing attention these days, with the potential to learn cross-session/animal/species representations and benefit from multi-session pretraining. (2/9)

16.05.2025 02:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Data Heterogeneity Limits the Scaling Effect of Pretraining Neural Data Transformers A key challenge in analyzing neuroscience datasets is the profound variability they exhibit across sessions, animals, and data modalities--i.e., heterogeneity. Several recent studies have demonstrated...

I'm excited to share my first πŸ¦‹-print with our latest work β€” "Data Heterogeneity Limits the Scaling Effect of Pretraining in Neural Data Transformers", where we examined the effect of scaling up pretraining data in neural foundation models carefully.🧐 (1/9)

Preprint:
www.biorxiv.org/content/10.1...

16.05.2025 02:39 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
NOT-OD-25-068: Supplemental Guidance to the 2024 NIH Grants Policy Statement: Indirect Cost Rates NIH Funding Opportunities and Notices in the NIH Guide for Grants and Contracts: Supplemental Guidance to the 2024 NIH Grants Policy Statement: Indirect Cost Rates NOT-OD-25-068. OD

1. Today the NIH director issued a new directive slashing overhead rates to 15%.

I want to provide some context on what that means and why it matters.

grants.nih.gov/grants/guide...

08.02.2025 00:18 β€” πŸ‘ 7036    πŸ” 4107    πŸ’¬ 258    πŸ“Œ 902
Preview
Advancing Neuromotor Interfaces by Open Sourcing Surface Electromyography (sEMG) Datasets for Pose Estimation and Surface Typing We’re releasing emg2qwerty and emg2poseβ€”two large datasets and benchmarks for sEMG-based typing and pose estimation, as part of the NeurIPS 2024 Datasets and Benchmarks track.

New 716 hour EMG dataset just dropped ✨ ai.meta.com/blog/open-so...

05.12.2024 21:34 β€” πŸ‘ 59    πŸ” 21    πŸ’¬ 0    πŸ“Œ 0

OK If we are moving to Bluesky I am rescuing my favourite ever twitter thread (Jan 2019).

The renamed:

Bluesky-sized history of neuroscience (biased by my interests)

01.12.2024 20:29 β€” πŸ‘ 623    πŸ” 203    πŸ’¬ 14    πŸ“Œ 14

@lpjiang97 is following 20 prominent accounts