Raphael Pisoni's Avatar

Raphael Pisoni

@4rtemi5.bsky.social

Unsupervised multimodal representation of a learning researcher. https://www.rpisoni.dev/

3,073 Followers  |  466 Following  |  158 Posts  |  Joined: 01.11.2024  |  2.0437

Latest posts by 4rtemi5.bsky.social on Bluesky

The US government should subsidize Open AI rather than OpenAI

07.11.2025 06:43 β€” πŸ‘ 49    πŸ” 7    πŸ’¬ 0    πŸ“Œ 1
Post image

On the occasion of the 1000th citation of our Sinkhorn-Knopp self-supervised representation learning paper, I've written a whole post about the history and the key bits of this method that powers the state-of-the-art SSL vision models.

Read it here :): docs.google.com/document/d/1...

15.10.2025 10:00 β€” πŸ‘ 18    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

We're ready!

21.09.2025 06:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The single most undervalued property of neural networks is self-consistency. We should change that!

06.09.2025 12:58 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image 08.08.2025 03:56 β€” πŸ‘ 163    πŸ” 22    πŸ’¬ 2    πŸ“Œ 3
Video thumbnail

You've been researching for a while!
Time to have some SOTA!

#aislop

26.07.2025 12:51 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I wanna talk to those experts you claim to have trained! Are they in the room with us now?

26.07.2025 10:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

You and Adam keep beating Sota? Stop doing that! Poor Sota!

26.07.2025 09:50 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Have some cool idea but only evaluate it on small models? Tough luck buddy. You only get your paper accepted if your experimental results are 0.2% above SOTA and too expensive to falsify!

Is academic publishing pay to win yet?

26.07.2025 09:45 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - 4rtemi5/modded-nanogpt Contribute to 4rtemi5/modded-nanogpt development by creating an account on GitHub.

I ran my experiments on this "Gaussian-Kernel Attention" on the GPT speedrun repo by Keller Jordan on 8xH100. How much that's worth to compare against BIG models I don't know but I found it interesting so here is the code:
github.com/4rtemi5/modd...

23.07.2025 20:14 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Is there a reason why none of the recent models use RBF-kernel Attention to get rid of the softmax-bottleneck for long context?
I tried replacing dot-product attention with the negative squared KQ-distance and was able to remove the softmax without issues and loss in performance!

23.07.2025 20:14 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
eurips.cc A NeurIPS-endorsed conference in Europe held in Copenhagen, Denmark

NeurIPS is endorsing EurIPS, an independently-organized meeting which will offer researchers an opportunity to additionally present NeurIPS work in Europe concurrently with NeurIPS.

Read more in our blog post and on the EurIPS website:
blog.neurips.cc/2025/07/16/n...
eurips.cc

16.07.2025 22:05 β€” πŸ‘ 123    πŸ” 38    πŸ’¬ 2    πŸ“Œ 3
Preview
Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets Training activation quantized neural networks involves minimizing a piecewise constant function whose gradient vanishes almost everywhere, which is undesirable for the standard back-propagation or cha...

Wow great hint! I actually had this unread paper open in a long forgotten tab. Seems like it's finally time to read it... ;)
arxiv.org/abs/1903.05662

08.07.2025 07:24 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This could be a way to nudge a neuron with a negative activation to still get a small positive gradient, potentially avoiding dead ReLUs in a more direct way.
Would this offer more granular control over learning dynamics compared to variants like Leaky ReLU?

08.07.2025 05:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Has anyone experimented with "conditional gradients"?
Thinking about a setup where, within a specific activation range (e.g., right before a ReLU), you'd only permit positive or negative gradients.

08.07.2025 05:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

With non-car stuff you mean IT-startups right?

03.07.2025 18:12 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Quick question to the SSL experts out there: Usually you evaluate an ssl-model by freezing it and training a linear probing layer. Would it be fair to somehow learn a final layer with more dimensions than classes and do a nearest-neighbor evaluation?

29.06.2025 11:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

There is an oak forest in central France that was planted 400 years ago by Colbert so that France would have quality hard wood by the 2000s to build ships for its navy.
This is the type of long term planning that Seldonian predictions can help improving.

17.06.2025 08:17 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

New anti-censorship jailbreak just dropped ;)

13.05.2025 02:17 β€” πŸ‘ 32    πŸ” 7    πŸ’¬ 1    πŸ“Œ 2
Preview
The Space Between: On Folding, Symmetries and Sampling Recent findings suggest that consecutive layers of neural networks with the ReLU activation function \emph{fold} the input space during the learning process. While many works hint at this phenomenon, ...

Link to the paper!
arxiv.org/abs/2503.08502

18.04.2025 11:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Currently on my way to #ICLR in Singapore where we'll present our latest paper on space folding in neural networks.
Would be happy to meet some people there so if you're at ICLR as well and want to hang out feel free to pm!πŸ™‚

18.04.2025 11:19 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Grok this! What a roller-coaster of emotions...πŸ€ͺ

16.04.2025 19:01 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

ModernBERT or DeBERTaV3?

What's driving performance: architecture or data?

To find out we pretrained ModernBERT on the same dataset as CamemBERTaV2 (a DeBERTaV3 model) to isolate architecture effects.

Here are our findings:

14.04.2025 15:41 β€” πŸ‘ 45    πŸ” 15    πŸ’¬ 3    πŸ“Œ 0

Super interesting. I think i'm around 1-2 on this scale but I'm limited by the complexity of the scene. When imagining only an apple it can be super realistic but for very complex things many details get lost and i have to focus for them to appear.

13.04.2025 15:39 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Just assembled a slide about local feature training time/dataset size.
Anything wrong/missing?

13.04.2025 11:20 β€” πŸ‘ 18    πŸ” 4    πŸ’¬ 5    πŸ“Œ 0

Apparently that definition is also known as "The Big Freeze" so you mean I'm not cooked right?πŸ₯΄βœŒοΈ

12.04.2025 00:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Is the project even still worth doing when wandb runs out of funny names or am I cooked?🫠

11.04.2025 23:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Meta
Addressing bias in LLMs

It's well-known that all leading LLMs have had issues with bias-specifically, they historically have leaned left when it comes to debated political and social topics. This is due to the types of training data available on the internet.

Our goal is to remove bias from our Al models and to make sure that Llama can understand and articulate both sides of a contentious issue. As part of this work, we're continuing to make Llama more responsive so that it answers questions, can respond to a variety of different viewpoints without passing judgment, and doesn't favor some views over others.

We have made improvements on these efforts with this releaseβ€”Llama 4 performs significantly better than Llama 3 and is comparable to Grok:

Meta Addressing bias in LLMs It's well-known that all leading LLMs have had issues with bias-specifically, they historically have leaned left when it comes to debated political and social topics. This is due to the types of training data available on the internet. Our goal is to remove bias from our Al models and to make sure that Llama can understand and articulate both sides of a contentious issue. As part of this work, we're continuing to make Llama more responsive so that it answers questions, can respond to a variety of different viewpoints without passing judgment, and doesn't favor some views over others. We have made improvements on these efforts with this releaseβ€”Llama 4 performs significantly better than Llama 3 and is comparable to Grok:

β€’ Llama 4 refuses less on debated political and social topics overall (from 7% in Lama 3.3 to below 2%).
β€’ Llama 4 is dramatically more balanced with which prompts it refuses to respond to (the proportion of unequal response refusals is now less than 1% on a set of debated topical questions).
β€’ Our testing shows that Llama 4 responds with strong political lean at a rate comparable to Grok (and at half of the rate of Llama 3.3) on a contentious set of political or social topics. While we are making progress, we know we have more work to do and will continue to drive this rate further down.
We're proud of this progress to date and remain committed to our goal of eliminating overall bias in our models.

β€’ Llama 4 refuses less on debated political and social topics overall (from 7% in Lama 3.3 to below 2%). β€’ Llama 4 is dramatically more balanced with which prompts it refuses to respond to (the proportion of unequal response refusals is now less than 1% on a set of debated topical questions). β€’ Our testing shows that Llama 4 responds with strong political lean at a rate comparable to Grok (and at half of the rate of Llama 3.3) on a contentious set of political or social topics. While we are making progress, we know we have more work to do and will continue to drive this rate further down. We're proud of this progress to date and remain committed to our goal of eliminating overall bias in our models.

Meta introduced Llama 4 models and added this section near the very bottom of the announcement 😬

β€œ[LLMs] historically have leaned left when it comes to debated political and social topics.”

ai.meta.com/blog/llama-4...

05.04.2025 22:08 β€” πŸ‘ 135    πŸ” 38    πŸ’¬ 5    πŸ“Œ 61
Preview
Department of Computer Science Computer Science Department at ETH Zurich. The department offers highest quality in computer science research and education and adds to business and industry growth.

πŸš€Hello, world! We are now live on Bluesky. This is the official account of the Department of Computer Science at ETH Zurich. Follow us for cutting-edge research, the latest innovations, event updates and insights into the future of technology. inf.ethz.ch
@csateth.bsky.social @ethzurich.bsky.social

24.03.2025 09:24 β€” πŸ‘ 21    πŸ” 8    πŸ’¬ 1    πŸ“Œ 0

@4rtemi5 is following 19 prominent accounts