Felix Petersen's Avatar

Felix Petersen

@petersen.ai.bsky.social

Machine learning researcher @Stanford. https://petersen.ai/

542 Followers  |  25 Following  |  18 Posts  |  Joined: 16.11.2024  |  1.7176

Latest posts by petersen.ai on Bluesky

Preview
The next generation of neural networks could live in hardware Researchers have devised a way to make computer vision systems more efficient by building networks out of computer chips’ logic gates.

I'm excited to share that our work on Convolutional Differentiable Logic Gate Networks was covered by MIT Technology Review. πŸŽ‰

www.technologyreview.com/2024/12/20/1...
@hildekuehne.bsky.social

27.12.2024 20:40 β€” πŸ‘ 11    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

Convolutional Differentiable Logic Gate Networks @FHKPetersen

13.12.2024 20:24 β€” πŸ‘ 45    πŸ” 3    πŸ’¬ 3    πŸ“Œ 0
Post image

Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms @FHKPetersen

12.12.2024 01:23 β€” πŸ‘ 10    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

Join us at our poster session today, 11am-2pm, at East Exhibit Hall A-C *#1502*.

12.12.2024 18:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
NeurIPS Poster Convolutional Differentiable Logic Gate NetworksNeurIPS 2024

Most innovative paper at #NeurIPS imho. Can we create a network that becomes the physical chip instead of running on a chip? Inference speedups and energy preservation are through the roof !

Oral on Friday at 10am PT

neurips.cc/virtual/2024...

12.12.2024 17:38 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Join us on Wednesday, 11am-2pm for our poster session on Newton Losses in *West Ballroom A-D #6207*. neurips.cc/virtual/2024...

10.12.2024 19:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Computer Vision Models with LLM Training Dynamics (TrAct)
YouTube video by Felix Petersen Computer Vision Models with LLM Training Dynamics (TrAct)

Learn more in our paper (arxiv.org/abs/2410.23970) and check out our paper video on YouTube: youtu.be/ZjTAjjxbkRY

04.12.2024 18:39 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

...and it speeds up overall training by factors ranging from 1.25x (for large ViT pre-training) to 4x (for ConvNets).
We benchmark TrAct on a suite of 50 experimental settings.

04.12.2024 18:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Our implementation is efficient, only modifies the gradient in the backward, and is compatible with various optimizers. To use *TrAct*, just wrap your first layer in a "TrAct" module...

04.12.2024 18:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Thus, we can effectively train the first-layer activations of a Vision model, with updates similar to those in the LLM Embedding layer.

04.12.2024 18:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We close this gap by proposing TrAct: we conceptually *Tr*ain *Act*ivations. While we can't train activations directly bc only weights are trainable, we formulate an optimization problem to find the optimal weights to match a GD step on the activations, and in closed-form modify the gradients resp.

04.12.2024 18:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

This means that learning of the vision model, at the first layer, is much slower than in LLMs, and that learning is actually faster on higher contrast regions of the image than in low contrast regions due to a proportionality between gradients of weights and input pixel values.

04.12.2024 18:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The big difference between LLMs and Vision models lies in the first layer:
* in LLMs we update Embeddings (/activations) directly
* but in Vision models we update the *weights* of the first layer, which causes indirect updates to the Activations (/embeddings)

04.12.2024 18:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Have you ever wondered how training dynamics differ between LLMs πŸ–‹οΈ and Vision πŸ‘οΈ models? We explore this and close the gap between VMs and LLMs in our #NeurIPS2024 paper "TrAct: Making First-layer Pre-Activations Trainable".
Paper link πŸ“œ: arxiv.org/abs/2410.23970
Video link πŸŽ₯: youtu.be/ZjTAjjxbkRY
🧡

04.12.2024 18:39 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1
Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms - NeurIPS2024
YouTube video by Felix Petersen Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms - NeurIPS2024

Check out our 5 minute paper video on YouTube πŸŽ₯: www.youtube.com/watch?v=7aFP...

28.11.2024 01:49 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A big thanks to my co-authors Christian Borgelt, @tobiassutter.bsky.social @hildekuehne.bsky.social Oliver Deussen and Stefano Ermon.

Also a shout-out to the authors of the methods we build on: @qberthet.bsky.social @mblondel.bsky.social @marcocuturi.bsky.social @bachfrancis.bsky.social ky.social

28.11.2024 01:49 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Newton losses is easy to implement, and it's empirical Fisher extension can be added to existing pipelines with a single call of `InjectFisher` between the model and the loss.

28.11.2024 01:49 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In Newton Losses, we merge SGD training of NNs with a Newton step on the loss. This is crucial for algorithmic losses like ranking and graph losses, esp. w/ vanishing+exploding grads. Intuition: if the loss is harder to optim. than the NN, we should use a stronger optimization method for the loss.

28.11.2024 01:49 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms When training neural networks with custom objectives, such as ranking losses and shortest-path losses, a common problem is that they are, per se, non-differentiable. A popular approach is to continuou...

I'm excited to share our NeurIPS 2024 paper "Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms" πŸ€–.
Paper link πŸ“œ: arxiv.org/abs/2410.19055

28.11.2024 01:49 β€” πŸ‘ 18    πŸ” 4    πŸ’¬ 1    πŸ“Œ 2
Convolutional Differentiable Logic Gate Networks - NeurIPS Oral - difflogic
YouTube video by Felix Petersen Convolutional Differentiable Logic Gate Networks - NeurIPS Oral - difflogic

If you're excited about #AI with #logic, check out our fully animated video on YouTube: youtu.be/FKQfMwFZvIE

17.11.2024 16:34 β€” πŸ‘ 20    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Excited to share our #NeurIPS 2024 Oral, Convolutional Differentiable Logic Gate Networks, leading to a range of inference efficiency records, including inference in only 4 nanoseconds 🏎️. We reduce model sizes by factors of 29x-61x over the SOTA. Paper: arxiv.org/abs/2411.04732

17.11.2024 16:34 β€” πŸ‘ 102    πŸ” 18    πŸ’¬ 4    πŸ“Œ 4

@petersen.ai is following 17 prominent accounts