UPDATE
project page: yingtiandt.github.io/dynamic-visi...
@davidtyt.bsky.social
UPDATE
project page: yingtiandt.github.io/dynamic-visi...
8/πΌοΈ Big Pictureβ¨
Optimizing to model world dynamics leads to brain-like representations.β¨
π§ The visual system isn't a patchwork of modules β itβs a unified system built on shared core principles.
7/π§ Finding 4
We introduce task-based functional localization.β¨
It:
1. Recovers many prior neuroscience results in a unified way
2. Reveals new structure in action understanding pathways
A novel scalable approach to functional brain mapping.
6/π Finding 3
Putting observations together:β¨
β’ Single-objective models align with all regions and behaviors
β’ Cortex shows hybrid, smooth representation transitions
π‘ A new perspective: the brain may implement a shared feature backbone β reused for diverse tasks, just like a βfoundation modelβ.
5/π Finding 2.2
These two arenβt isolated β theyβre:
β’ Blended across ventral & dorsal streamsβ¨
β’ Smoothly mapped across the cortex
So, the visual system isnβt modular β itβs highly distributed, and the classic stream separation theory appears oversimplified.
4/π Finding 2.1β¨
So, what does the brain actually compute during dynamic vision?
Across 10 cognitive tasks (e.g., pose, social cues, action), just two suffice to explain brain-like representations:
β’ Object form
β’ Appearance-free motion
3/π Finding 1β¨
β
Dynamic models > static image models > classic vision modelsβ¨
β
Across both dorsal & ventral regionsβ¨
β
Across neural & behavioral alignment
Best match to brain: V-JEPA.
In general, learning world dynamics give alignment to the whole visual system.
2/π§ͺ Approachβ¨
We benchmarked diverse video models, each with a different pretraining objective.β¨
Then: tested how well they predict human fMRI responses to natural movies.β¨
π§ ~10,000 voxels, whole visual system.
1/π Motivationβ¨
The brain is thought to process vision through two streams:β¨
πΌ Ventral β objects, form, identityβ¨
π§ Dorsal β motion, spatial layout, actions
Image models explain ventral well.β¨But: what about dorsal? Can one model do both?
π¨ New research: Can the brain's complex visual system β ventral & dorsal processing streams β arise from a single goal?
We study dynamic vision and reveal how object and motion recognition β long thought to be separate β could emerge from the same underlying goal.
π§ NEW PREPRINT
Many-Two-One: Diverse Representations Across Visual Pathways Emerge from A Single Objective
www.biorxiv.org/content/10.1...