Pau Rodriguez's Avatar

Pau Rodriguez

@paurodriguez.bsky.social

Research Scientist at Apple Machine Learning Research. Previously ServiceNow and Element AI in Montréal.

146 Followers  |  344 Following  |  13 Posts  |  Joined: 16.11.2024  |  1.7556

Latest posts by paurodriguez.bsky.social on Bluesky

Our work on fine-grained control of LLMs and diffusion models via Activation Transport will be presented @iclr_conf as spotlight✨Check out our new blog post machinelearning.apple.com/research/tra...

11.04.2025 06:58 — 👍 7    🔁 1    💬 0    📌 1
Què és l'aprenentatge profund ? - La Dimoni de Maxwell #deeplearning #ciencia #català #barcelona
YouTube video by Deep Learning Barcelona Què és l'aprenentatge profund ? - La Dimoni de Maxwell #deeplearning #ciencia #català #barcelona

Què és l’aprenentatge profund ?

La @marionamec.bsky.social de @neurofregides.bsky.social ens ho explica en motiu del Deep Learning Barcelona Symposium 2024 (@dlbcn.ai), aquest dijous 19 de desembre.

#deeplearning #ciencia #català #barcelona

www.youtube.com/shorts/R4u_Z...

16.12.2024 08:49 — 👍 7    🔁 3    💬 0    📌 1
Post image

Evaluating your LLM uncertainties with Rougle-L will show clear winners... except that they aren't actually good. We find that Rouge-L spuriously favors some methods over others. 🧵1/4

📄 openreview.net/forum?id=jGt...
NeurIPS: Sunday, East Exhibition Hall A, Safe Gen AI workshop

12.12.2024 11:36 — 👍 7    🔁 3    💬 1    📌 1

Kudos to all co-authors 👏 Arno Blaas, Michal Klein, Luca Zappella, Nicholas Apostoloff, Marco Cuturi, and Xavier Suau.

Extra 👏 to Xavi for making this so great! Like a friend would say, he's the Rolls-Royce of the co-authors, and he should be regarded the first author too!

10.12.2024 13:09 — 👍 2    🔁 0    💬 0    📌 0

Summary:
🤝 Unifying activation steering w/ OT.
✨ Linear-AcT preserves distributions w/ interpretable ([0, 1]) strength.
💪 Robust: models/layers/modalities
💬 LLMs: toxicity mitigation, truthfulness and concept induction,
🌄 T2I: style induction and concept negation.
🚀 Negligible cost!

10.12.2024 13:09 — 👍 3    🔁 0    💬 1    📌 0
Post image

8/9 T2I models tend to generate negated concepts 😮

In the image, StableDiffusion XL prompted with: “2 tier cake with multicolored stars attached to it and no {white bear, pink elephant, gorilla} can be seen.”

✨Linear-AcT makes the negated concept disappear✨

10.12.2024 13:09 — 👍 4    🔁 1    💬 1    📌 0
Post image

7/9 And here we induce Cyberpunk 🤖 for the same prompt!

10.12.2024 13:09 — 👍 2    🔁 0    💬 2    📌 0
Post image

6/9 Amazingly, we can condition Text-to-Image (T2I) Diffusion with the same exact method we used for LLMs! 🤯

In this example, we induce a specific style (Art Nouveau 🎨), which we can accurately control with our λ parameter.

10.12.2024 13:09 — 👍 2    🔁 0    💬 1    📌 0
Post image

5/9 With Linear-AcT, we achieve great results in LLM 👿 toxicity mitigation and 👩🏼‍⚖️ truthfulness induction.

And the best result is always obtained at λ=1, as opposed to vector-based steering methods!

10.12.2024 13:09 — 👍 1    🔁 0    💬 1    📌 0
Post image

4/9 Linear-AcT preserves target distributions, with interpretable strength λ 🌈

🍰 All we need is two small sets of sentences {a},{b} from source and target distributions to estimate the Optimal Transport (OT) map 🚚

🚀 We linearize the map for speed/memory, thus ⭐Linear-AcT⭐

10.12.2024 13:09 — 👍 1    🔁 0    💬 1    📌 0
Post image

3/9 An activation has a different output distributions per behavior, eg. 🦠 toxic (source) and 😊 non-toxic (target). i) Vector-based AS moves activations OOD 🤯, with catastrophic consequences 💥 harming model utility. ii) The strength λ is unbounded and non-interpretable 🤨!

10.12.2024 13:09 — 👍 1    🔁 0    💬 1    📌 0

2/9 🤓 Activation Steering (AS) is a fast and cheap alternative for alignment/control.

Most AS techniques perform a vector addition such as a* = a + λv, where v is some estimated vector and λ the conditioning strength. How v is estimated differs for each method.

10.12.2024 13:09 — 👍 1    🔁 0    💬 1    📌 0

1/9 🤔 How do we currently align/control generative models?
- Pre-prompting
- Fine-tuning
- RLHF
However, these techniques can be slow/expensive! 🐢

10.12.2024 13:09 — 👍 2    🔁 0    💬 1    📌 0
Post image

Thrilled to share the latest work from our team at
@Apple
where we achieve interpretable and fine-grained control of LLMs and Diffusion models via Activation Transport 🔥

📄 arxiv.org/abs/2410.23054
🛠️ github.com/apple/ml-act

0/9 🧵

10.12.2024 13:09 — 👍 47    🔁 15    💬 3    📌 5
Announcing the NeurIPS 2024 Test of Time Paper Awards  – NeurIPS Blog

Thank you to the @neuripsconf.bsky.social for this recognition of the Generative Adversarial Nets paper published ten years ago with @ian-goodfellow.bsky.social, Jean Pouget-Abadie, @memimo.bsky.social, Bing Xu, David Warde-Farley, Sherjil Ozair and Aaron Courville.
blog.neurips.cc/2024/11/27/a...

28.11.2024 14:36 — 👍 195    🔁 20    💬 4    📌 0
Post image

Apple will be a platinum sponsor of the Deep Learning Barcelona Symposim 2024. This is the first time that Apple sponsors the event. #DLBCN

22.11.2024 07:42 — 👍 1    🔁 1    💬 0    📌 0
Inbox | Substack

Bring stats to LM evals!!

open.substack.com/pub/desiriva...

21.11.2024 16:59 — 👍 4    🔁 1    💬 0    📌 0

Watching Frieren can’t stop thinking that demons are evil LLMs 😅

20.11.2024 12:10 — 👍 3    🔁 0    💬 0    📌 0

@paurodriguez is following 20 prominent accounts