Pau Rodriguez's Avatar

Pau Rodriguez

@paurodriguez.bsky.social

Research Scientist at Apple Machine Learning Research. Previously ServiceNow and Element AI in Montrรฉal.

151 Followers  |  347 Following  |  18 Posts  |  Joined: 16.11.2024  |  2.3574

Latest posts by paurodriguez.bsky.social on Bluesky

The best part? LinEAS works on LLMs & T2I models.

Huge thanks to the team: Michal Klein, Eleonora Gualdoni, Valentino Maiorca, Arno Blaas, Luca Zappella, Marco Cuturi, & Xavier Suau (who contributed like a 1st author too๐Ÿฅ‡)!

๐Ÿ’ปhttps://github.com/apple/ml-lineas
๐Ÿ“„https://arxiv.org/abs/2503.10679

21.10.2025 10:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Sparsity improves utility while mitigating toxicity. Toxicity results on Qwen2.5-7B using only 32 sentences, at different levels of sparsity ฮณ that result in different support sizes (x axis). At 1K optimization steps, with a support of about 1% we maintain similar toxicity (left, center-left) while PPLWIK decreases (center-right) and MMLU increases (right). Note that too long optimizations (10k steps) might harm utility, due to overfitting. Similarly, short optimizations (e.g., 100 steps) and strong sparsity leads to low conditioning (mild toxicity mitigation).

Sparsity improves utility while mitigating toxicity. Toxicity results on Qwen2.5-7B using only 32 sentences, at different levels of sparsity ฮณ that result in different support sizes (x axis). At 1K optimization steps, with a support of about 1% we maintain similar toxicity (left, center-left) while PPLWIK decreases (center-right) and MMLU increases (right). Note that too long optimizations (10k steps) might harm utility, due to overfitting. Similarly, short optimizations (e.g., 100 steps) and strong sparsity leads to low conditioning (mild toxicity mitigation).

LinEAS globally ๐ŸŒ optimizes all 1D-Wasserstein distances between source and target activation distributions at multiple layers via backprop. โœจ Bonus: we can now add a sparsity objective. The result? Targeted ๐ŸŽฏ interventions that preserve fluency with strong conditioning!

21.10.2025 10:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

Existing methods estimate layer-wise ๐Ÿฅž interventions. While powerful, layer-wise methods have some approximation error since the optimization is done locally, without considering multiple layers at once ๐Ÿค”. We circumvent this problem in LinEAS with an end-to-end optimization โš™๏ธ!

21.10.2025 10:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
LinEAS learns lightweight maps to steer pretrained model activations. With LinEAS, we gain fine-grained control on text-to-image generation to induce precise styles (in the figure) or remove objects. The same procedure also allows controlling LLMs.

LinEAS learns lightweight maps to steer pretrained model activations. With LinEAS, we gain fine-grained control on text-to-image generation to induce precise styles (in the figure) or remove objects. The same procedure also allows controlling LLMs.

๐ŸฆŠActivation Steering modifies a model's internal activations to control its output. Think of a slider ๐ŸŽš๏ธ that gradually adds a concept, like art style ๐ŸŽจ to the output. This is also a powerful tool for safety, steering models away from harmful content.

21.10.2025 10:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

๐Ÿš€ Excited to share LinEAS, our new activation steering method accepted at NeurIPS 2025! It approximates optimal transport maps e2e to precisely guide ๐Ÿงญ activations achieving finer control ๐ŸŽš๏ธ with โœจ less than 32 โœจ prompts!

๐Ÿ’ปhttps://github.com/apple/ml-lineas
๐Ÿ“„https://arxiv.org/abs/2503.10679

21.10.2025 10:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image

Our two phenomenal interns, Alireza Mousavi-Hosseini and Stephen Zhang @syz.bsky.social have been cooking some really cool work with Michal Klein and me over the summer.

Relying on optimal transport couplings (to pick noise and data pairs) should, in principle, be helpful to guide flow matching

๐Ÿงต

03.10.2025 20:50 โ€” ๐Ÿ‘ 30    ๐Ÿ” 7    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

Our work on fine-grained control of LLMs and diffusion models via Activation Transport will be presented @iclr_conf as spotlightโœจCheck out our new blog post machinelearning.apple.com/research/tra...

11.04.2025 06:58 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Quรจ รฉs l'aprenentatge profund ? - La Dimoni de Maxwell #deeplearning #ciencia #catalร  #barcelona
YouTube video by Deep Learning Barcelona Quรจ รฉs l'aprenentatge profund ? - La Dimoni de Maxwell #deeplearning #ciencia #catalร  #barcelona

Quรจ รฉs lโ€™aprenentatge profund ?

La @marionamec.bsky.social de @neurofregides.bsky.social ens ho explica en motiu del Deep Learning Barcelona Symposium 2024 (@dlbcn.ai), aquest dijous 19 de desembre.

#deeplearning #ciencia #catalร  #barcelona

www.youtube.com/shorts/R4u_Z...

16.12.2024 08:49 โ€” ๐Ÿ‘ 7    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Post image

Evaluating your LLM uncertainties with Rougle-L will show clear winners... except that they aren't actually good. We find that Rouge-L spuriously favors some methods over others. ๐Ÿงต1/4

๐Ÿ“„ openreview.net/forum?id=jGt...
NeurIPS: Sunday, East Exhibition Hall A, Safe Gen AI workshop

12.12.2024 11:36 โ€” ๐Ÿ‘ 7    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

Kudos to all co-authors ๐Ÿ‘ Arno Blaas, Michal Klein, Luca Zappella, Nicholas Apostoloff, Marco Cuturi, and Xavier Suau.

Extra ๐Ÿ‘ to Xavi for making this so great! Like a friend would say, he's the Rolls-Royce of the co-authors, and he should be regarded the first author too!

10.12.2024 13:09 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Summary:
๐Ÿค Unifying activation steering w/ OT.
โœจ Linear-AcT preserves distributions w/ interpretable ([0, 1]) strength.
๐Ÿ’ช Robust: models/layers/modalities
๐Ÿ’ฌ LLMs: toxicity mitigation, truthfulness and concept induction,
๐ŸŒ„ T2I: style induction and concept negation.
๐Ÿš€ Negligible cost!

10.12.2024 13:09 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

8/9 T2I models tend to generate negated concepts ๐Ÿ˜ฎ

In the image, StableDiffusion XL prompted with: โ€œ2 tier cake with multicolored stars attached to it and no {white bear, pink elephant, gorilla} can be seen.โ€

โœจLinear-AcT makes the negated concept disappearโœจ

10.12.2024 13:09 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

7/9 And here we induce Cyberpunk ๐Ÿค– for the same prompt!

10.12.2024 13:09 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

6/9 Amazingly, we can condition Text-to-Image (T2I) Diffusion with the same exact method we used for LLMs! ๐Ÿคฏ

In this example, we induce a specific style (Art Nouveau ๐ŸŽจ), which we can accurately control with our ฮป parameter.

10.12.2024 13:09 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

5/9 With Linear-AcT, we achieve great results in LLM ๐Ÿ‘ฟ toxicity mitigation and ๐Ÿ‘ฉ๐Ÿผโ€โš–๏ธ truthfulness induction.

And the best result is always obtained at ฮป=1, as opposed to vector-based steering methods!

10.12.2024 13:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

4/9 Linear-AcT preserves target distributions, with interpretable strength ฮป ๐ŸŒˆ

๐Ÿฐ All we need is two small sets of sentences {a},{b} from source and target distributions to estimate the Optimal Transport (OT) map ๐Ÿšš

๐Ÿš€ We linearize the map for speed/memory, thus โญLinear-AcTโญ

10.12.2024 13:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

3/9 An activation has a different output distributions per behavior, eg. ๐Ÿฆ  toxic (source) and ๐Ÿ˜Š non-toxic (target). i) Vector-based AS moves activations OOD ๐Ÿคฏ, with catastrophic consequences ๐Ÿ’ฅ harming model utility. ii) The strength ฮป is unbounded and non-interpretable ๐Ÿคจ!

10.12.2024 13:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

2/9 ๐Ÿค“ Activation Steering (AS) is a fast and cheap alternative for alignment/control.

Most AS techniques perform a vector addition such as a* = a + ฮปv, where v is some estimated vector and ฮป the conditioning strength. How v is estimated differs for each method.

10.12.2024 13:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

1/9 ๐Ÿค” How do we currently align/control generative models?
- Pre-prompting
- Fine-tuning
- RLHF
However, these techniques can be slow/expensive! ๐Ÿข

10.12.2024 13:09 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Thrilled to share the latest work from our team at
@Apple
where we achieve interpretable and fine-grained control of LLMs and Diffusion models via Activation Transport ๐Ÿ”ฅ

๐Ÿ“„ arxiv.org/abs/2410.23054
๐Ÿ› ๏ธ github.com/apple/ml-act

0/9 ๐Ÿงต

10.12.2024 13:09 โ€” ๐Ÿ‘ 47    ๐Ÿ” 15    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 5
Announcing the NeurIPS 2024 Test of Time Paper Awardsย  โ€“ NeurIPS Blog

Thank you to the @neuripsconf.bsky.social for this recognition of the Generative Adversarial Nets paper published ten years ago with @ian-goodfellow.bsky.social, Jean Pouget-Abadie, @memimo.bsky.social, Bing Xu, David Warde-Farley, Sherjil Ozair and Aaron Courville.
blog.neurips.cc/2024/11/27/a...

28.11.2024 14:36 โ€” ๐Ÿ‘ 194    ๐Ÿ” 20    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0
Post image

Apple will be a platinum sponsor of the Deep Learning Barcelona Symposim 2024. This is the first time that Apple sponsors the event. #DLBCN

22.11.2024 07:42 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Inbox | Substack

Bring stats to LM evals!!

open.substack.com/pub/desiriva...

21.11.2024 16:59 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Watching Frieren canโ€™t stop thinking that demons are evil LLMs ๐Ÿ˜…

20.11.2024 12:10 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@paurodriguez is following 20 prominent accounts