Yingtian (David) Tang's Avatar

Yingtian (David) Tang

@davidtyt.bsky.social

18 Followers  |  4 Following  |  11 Posts  |  Joined: 30.07.2025  |  1.6451

Latest posts by davidtyt.bsky.social on Bluesky

Many-Two-One

UPDATE
project page: yingtiandt.github.io/dynamic-visi...

04.08.2025 03:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

8/πŸ–ΌοΈ Big Picture

Optimizing to model world dynamics leads to brain-like representations.


🧠 The visual system isn't a patchwork of modules β€” it’s a unified system built on shared core principles.

30.07.2025 13:39 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

7/🧠 Finding 4
We introduce task-based functional localization.

It:
1. Recovers many prior neuroscience results in a unified way
2. Reveals new structure in action understanding pathways

A novel scalable approach to functional brain mapping.

30.07.2025 13:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

6/πŸŒ€ Finding 3
Putting observations together:

β€’ Single-objective models align with all regions and behaviors
β€’ Cortex shows hybrid, smooth representation transitions

πŸ’‘ A new perspective: the brain may implement a shared feature backbone β€” reused for diverse tasks, just like a β€œfoundation model”.

30.07.2025 13:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

5/🌐 Finding 2.2
These two aren’t isolated β€” they’re:
β€’ Blended across ventral & dorsal streams

β€’ Smoothly mapped across the cortex

So, the visual system isn’t modular β€” it’s highly distributed, and the classic stream separation theory appears oversimplified.

30.07.2025 13:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

4/🌐 Finding 2.1

So, what does the brain actually compute during dynamic vision?

Across 10 cognitive tasks (e.g., pose, social cues, action), just two suffice to explain brain-like representations:
β€’ Object form
β€’ Appearance-free motion

30.07.2025 13:35 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

3/πŸ“Š Finding 1

βœ… Dynamic models > static image models > classic vision models

βœ… Across both dorsal & ventral regions

βœ… Across neural & behavioral alignment

Best match to brain: V-JEPA.
In general, learning world dynamics give alignment to the whole visual system.

30.07.2025 13:34 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

2/πŸ§ͺ Approach

We benchmarked diverse video models, each with a different pretraining objective.


Then: tested how well they predict human fMRI responses to natural movies.

🧠 ~10,000 voxels, whole visual system.

30.07.2025 13:34 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

1/πŸ” Motivation

The brain is thought to process vision through two streams:

πŸ–Ό Ventral β€” objects, form, identity

🧭 Dorsal β€” motion, spatial layout, actions

Image models explain ventral well.
But: what about dorsal? Can one model do both?

30.07.2025 13:33 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

🚨 New research: Can the brain's complex visual system β€” ventral & dorsal processing streams β€” arise from a single goal?

We study dynamic vision and reveal how object and motion recognition β€” long thought to be separate β€” could emerge from the same underlying goal.

30.07.2025 13:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Many-Two-One: Diverse Representations Across Visual Pathways Emerge from A Single Objective How the human brain supports diverse behaviours has been debated for decades. The canonical view divides visual processing into distinct "what" and "where/how" streams – however, their origin and inde...

🧠 NEW PREPRINT
Many-Two-One: Diverse Representations Across Visual Pathways Emerge from A Single Objective
www.biorxiv.org/content/10.1...

30.07.2025 13:31 β€” πŸ‘ 27    πŸ” 11    πŸ’¬ 2    πŸ“Œ 5

@davidtyt is following 4 prominent accounts