A Erdem Sagtekin's Avatar

A Erdem Sagtekin

@aesagtekin.bsky.social

theoretical neuroscience phd student at columbia

697 Followers  |  538 Following  |  9 Posts  |  Joined: 14.10.2024
Posts Following

Posts by A Erdem Sagtekin (@aesagtekin.bsky.social)

Post image

I am totally pumped about this new work . "Task-trained RNNs" are a powerful and influential framework in neuroscience, but have lacked a firm theoretical footing. This work provides one, and makes direct contact with the classical theory of random RNNs:
www.biorxiv.org/content/10.6...

04.03.2026 17:12 β€” πŸ‘ 84    πŸ” 31    πŸ’¬ 2    πŸ“Œ 3

7/7 Overall, EF is a powerful temporal credit assignment mechanism and a promising candidate model for learning in biological systems. It was an incredible experience working on this with @colin-bredenberg.bsky.social and Cristina, and I’m looking forward to feedback and discussing error forcing!

08.01.2026 22:10 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Multi-panel figure comparing Error Forcing, Teacher Forcing, and backpropagation through time on three RNN tasks: delayed XOR, sine wave generation, and evidence integration. Across increasing delay and task settings, Error Forcing consistently yields a higher fraction of networks that learn and lower test error than Teacher Forcing or backpropagation alone.

Multi-panel figure comparing Error Forcing, Teacher Forcing, and backpropagation through time on three RNN tasks: delayed XOR, sine wave generation, and evidence integration. Across increasing delay and task settings, Error Forcing consistently yields a higher fraction of networks that learn and lower test error than Teacher Forcing or backpropagation alone.

6/7 We tried three different tasks with varying difficulties and showed that using EF together with BPTT improves learning, relative to using TF with BPTT or using BPTT alone. For biological plausibility, we also tried error forcing with RFLO, and found that it can improve RFLO as well.

08.01.2026 22:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

5/7 We then provided a probabilistic perspective, showing that EF is an approximation to the expectation-maximization algorithm with a reparameterization trick, where, in the E step, neural activities are adjusted, and, in the M step, synaptic weights are adjusted (similar to predictive coding).

08.01.2026 22:10 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Three-panel schematic comparing BPTT, Teacher Forcing, and Error Forcing in a two-dimensional neuron-activity space. A diagonal β€œzero-error manifold” is shown; BPTT follows a free-running state trajectory, Teacher Forcing projects to a fixed target state on the manifold, and Error Forcing produces corrected states that are more flexible.

Three-panel schematic comparing BPTT, Teacher Forcing, and Error Forcing in a two-dimensional neuron-activity space. A diagonal β€œzero-error manifold” is shown; BPTT follows a free-running state trajectory, Teacher Forcing projects to a fixed target state on the manifold, and Error Forcing produces corrected states that are more flexible.

4/7 We first showed this by providing a geometric perspective. In scenarios where the dimensionality of the network is higher than that of the output, TF overconstrains the network dynamics during learning, which degrades its benefits. In contrast, EF minimally intervenes in the network dynamics.

08.01.2026 22:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

3/7 Building on their work, as well as Kenji Doya’s and many others’, we realized that a slight modification to teacher forcing, in which we feed the error in a specific way (hence the name Error Forcing) rather than feeding the teacher (target) activity to RNN states, leads to better optimization.

08.01.2026 22:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

2/7 In our NeurIPS 2025 spotlight paper, we introduced Error Forcing (EF). Our method builds on Teacher Forcing (TF), a beautiful method proposed more than 35 years ago to improve RNN learning. Recently, the Durstewitz Lab elegantly showed the benefits of Generalized Teacher Forcing.

08.01.2026 22:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Diagram of a recurrent neural network: input goes into the network, output is compared to a target to produce an error, and dotted feedback arrows show updates to neural activity and to synaptic weights.

Diagram of a recurrent neural network: input goes into the network, output is compared to a target to produce an error, and dotted feedback arrows show updates to neural activity and to synaptic weights.

1/7 How should feedback signals influence a network during learning? Should they first adjust synaptic weights, which then indirectly change neural activity (as in backprop.)? Or should they first adjust neural activity to guide synaptic updates (e.g., target prop.)? openreview.net/forum?id=xVI...

08.01.2026 22:10 β€” πŸ‘ 40    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Preview
A theory of multi-task computation and task selection Neural activity during the performance of a stereotyped behavioral task is often described as low-dimensional, occupying only a limited region in the space of all firing-rate patterns. This region has...

1/X Excited to present this preprint on multi-tasking, with
@david-g-clark.bsky.social and Ashok Litwin-Kumar! Timely too, as β€œlow-D manifold” has been trending again. (If you read thru the end, we escape Flatland and return to the glorious high-D world we deserve.) www.biorxiv.org/content/10.6...

15.12.2025 19:41 β€” πŸ‘ 83    πŸ” 20    πŸ’¬ 1    πŸ“Œ 2
Post image

1/6 Why does the brain maintain such precise excitatory-inhibitory balance?
Our new preprint explores a provocative idea: Small, targeted deviations from this balance may serve a purpose: to encode local error signals for learning.
www.biorxiv.org/content/10.1...
led by @jrbch.bsky.social

27.05.2025 07:49 β€” πŸ‘ 181    πŸ” 57    πŸ’¬ 5    πŸ“Œ 3
Post image

How to find all fixed points in piece-wise linear recurrent neural networks (RNNs)?
A short thread 🧡

In RNNs with N units with ReLU(x-b) activations the phase space is partioned in 2^N regions by hyperplanes at x=b 1/7

11.12.2024 01:32 β€” πŸ‘ 63    πŸ” 12    πŸ’¬ 1    πŸ“Œ 0
Preview
Simplified derivations for high-dimensional convex learning problems Statistical physics provides tools for analyzing high-dimensional problems in machine learning and theoretical neuroscience. These calculations, particularly those using the replica method, often invo...

(1/5) Fun fact: Several classic results in the stat. mech. of learning can be derived in a couple lines of simple algebra!

In this paper with Haim Sompolinsky, we simplify and unify derivations for high-dimensional convex learning problems using a bipartite cavity method.
arxiv.org/abs/2412.01110

03.12.2024 19:34 β€” πŸ‘ 57    πŸ” 16    πŸ’¬ 2    πŸ“Œ 1

This list likely reflects mainly my interests and circle, and I’m sure I’ve missed many people, but I gave it a try: (I’ll be slowly editing it until it reaches 150/150)

go.bsky.app/7VFUkdn

(also, I tried but couldn't remove my profile...)

09.11.2024 12:35 β€” πŸ‘ 81    πŸ” 51    πŸ’¬ 48    πŸ“Œ 8

i enjoyed reading the geometry of plasticity paper and felt that something important was coming, this is it:

01.11.2024 08:35 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0