Russ Salakhutdinov's Avatar

Russ Salakhutdinov

@rsalakhu.bsky.social

VP of Research, GenAI @ Meta (Multimodal LLMs, AI Agents), UPMC Professor of Computer Science at CMU, ex-Director of AI research at @Apple, co-founder Perceptual Machines (acquired by Apple)

467 Followers  |  7 Following  |  25 Posts  |  Joined: 23.11.2024  |  1.9131

Latest posts by rsalakhu.bsky.social on Bluesky

Our approach shows strong generalization and versatility in generating accurate prompts for objects, styles and images across multiple T2I models, including Stable Diffusion, DALL-E, and Midjourney. It also enables easy editing and multi-concept prompt generation.

28.04.2025 22:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Prompt engineering for personalized image generation is labor-intensive, requires model-specific tuning, limiting generalization.

PRISM uses VLMs and iterative in-context learning to automatically generate effective, human-readable prompts using only black-box access to image generation models.

28.04.2025 22:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

New work on automated prompt engineering for personalized text-to-image generation:

PRISM: Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Paper + Code: kellyyutonghe.github.io/prism/

28.04.2025 22:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Carnegie Mellon University at ICLR 2025 CMU researchers are presenting 143 papers at the Thirteenth International Conference on Learning Representations (ICLR 2025), held from April 24 - 28 at the Singapore EXPO. Here is a quick overview of...

blog.ml.cmu.edu/2025/04/23/c...

23.04.2025 15:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Copilot Arena: A Platform for Code Figure 1. Copilot Arena is a VSCode extension that collects human preferences of code directly from developers.Β  As model capabilities improve, large language models (LLMs) are increasingly integra...

blog.ml.cmu.edu/2025/04/09/c...

09.04.2025 20:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Llama The open-source AI models you can fine-tune, distill and deploy anywhere. Choose from our collection of models: Llama 4 Maverick and Llama 4 Scout.

www.llama.com

Llama4 models are out! Open sourced! Check them out:

β€œNative multimodality, mixture-of-experts models, super long context windows, step changes in performance, and unparalleled efficiency. All in easy-to-deploy sizes custom fit for how you want to use it”

05.04.2025 19:18 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

With small perturbations (less than 5% of total web page pixels), attackers can execute targeted adversarial goals with up to 67% success rates.

We also find that inference-time compute that is often used to improve model performance can introduce new vulnerabilities and harm robustness.

19.02.2025 22:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Dissecting Adversarial Robustness of Multimodal LM Agents Dissecting Adversarial Robustness of Multimodal LM Agents

New work #ICLR2025 on β€œDissecting Adversarial Robustness of Multimodal LM Agents” that shows that one can successfully break latest agents that use black-box frontier LLMs, including agents that perform reflection and tree search.

Paper + Code + Data: chenwu.io/attack-agent/

19.02.2025 22:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GenAI Summit 2025 #GenAIUCSD25

Excited to be at the GenAI Summit at UCSD!

I'll be sharing our latest work on VisualWebArena, inference-time tree search, and Internet-scale training of LLM Agents.

genaisummit2025.ucsd.edu

19.02.2025 17:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

4/4 Llama 3.1 70B agents successfully complete 16.7% of tasks on 150k websites. Agents trained on human-annotated data from Mind2Web and WebLINX struggle to generalize to real-world websites. Adding synthetic data significantly improves generalization.

With B Trabucco, G Sigurdsson, R Piramuthu

12.02.2025 02:22 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

3/4 Language models perform competitively with human annotators, achieving:
- 97% accuracy in detecting and filtering harmful content
- 89% success rate in generating feasible tasks
- 82% accuracy in judging successful task completions

12.02.2025 02:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

2/4 The pipeline follows a three-step process:
- LLM generates tasks for 150k websites
- LLM agents complete these tasks and produce trajectories
- LLM reviews the trajectories and evaluates their success

12.02.2025 02:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

1/4 New Work on InSTA: A pipeline for Internet-scale training of web agents across 150k diverse websites without human annotations.

Paper + Code: data-for-agents.github.io
Environment: github.com/data-for-age...

12.02.2025 02:21 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

3/3 Joint work with Tiffani Min, Yue Wu, Jimin Sun, Max Kaufmann, Fahim Tajwar, Yonatan Bisk

10.02.2025 22:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

2/3 Offline-collected state transitions are evaluated using PRMs to determine optimal intervention timing, creating labeled trajectories for training the helper model.

This minimizes costly intervention calls during training while leveraging PRMs to enhance robustness to off-policy data.

10.02.2025 22:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

1/3 New work on Self-Regulation and Requesting Interventions: Enabling agents with a limited intervention budget to decide when to seek help:

Paper: soyeonm.github.io/self_reg/

We develop an offline framework that trains a helper policy to request interventions by combining LLM-based PRMs with RL

10.02.2025 22:28 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Optimizing LLM Test-Time Compute Involves Solving a Meta-RL Problem Figure 1: Training models to optimize test-time compute and learn "how to discover" correct responses, as opposed to the traditional learning paradigm of learning "what answer" to output. The major...

blog.ml.cmu.edu/2025/01/08/o...

10.01.2025 02:20 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Inductive biases of neural network modularity in spatial navigation TL;DR: The brain may have evolved a modular architecture for daily tasks, with circuits featuring functionally specialized modules that match the task structure. We hypothesize that this architecture ...

blog.ml.cmu.edu/2025/01/02/i...

02.01.2025 16:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

🌲 Ruslan Salakhutdinov (@rsalakhu.bsky.social) from CMU (@scsatcmu.bsky.social) opened the workshop with a talk on Tree Search for Language Model Agents.

Timestamp 36:20 in neurips.cc/virtual/2024...

πŸ“Ž arxiv.org/abs/2407.01476

#NeurIPS2024 #AdaptiveFoundationModels

19.12.2024 04:59 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸŽ‰ Had fun at #NeurIPS2024 Workshop on #AdaptiveFoundationModels!

πŸš€ Speakers: @rsalakhu.bsky.social @sedielem.bsky.social Kate Saenko, Matthias Bethge / @vishaalurao.bsky.social Minjoon Seo, Bing Liu, Tianqi Chen

🌐Posters: adaptive-foundation-models.org/papers

🎬 neurips.cc/virtual/2024...

🧡Recap!

19.12.2024 04:59 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

With my amazing students and collaborators at @neuripsconf.bsky.social in Vancouver!

15.12.2024 17:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
ScribeAgent: Fine-Tuning Open-Source LLMs for Enhanced Web Navigation TL;DR: LLM web agents are designed to predict a sequence of actions to complete a user-specified task. Most existing agents are built on top of general-purpose, proprietary models like GPT-4 and rely ...

blog.ml.cmu.edu/2024/12/06/s...

07.12.2024 04:09 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Carnegie Mellon University at NeurIPS 2024 Carnegie Mellon University is proud to present 194 papers at the 38th conference on Neural Information Processing Systems (NeurIPS 2024), held from December 10-15 at the Vancouver Convention Center. H...

Carnegie Mellon University at NeurIPS 2024 – Machine Learning Blog | ML@CMU | Carnegie Mellon University

Carnegie Mellon University is proud to present 194 papers at the 38th conference on Neural Information Processing Systems (NeurIPS 2024)

blog.ml.cmu.edu/2024/12/02/c...

03.12.2024 15:34 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

2/2 Our findings show that even when unlearning a single fact, current methods either fail to properly unlearn with high recall or end up unlearning many other irrelevant facts.

Paper: arxiv.org/abs/2410.15153
Code+Dataset: github.com/wrh14/deep_u...

joint work R Wu, C Yadav, K Chaudhuri.

03.12.2024 14:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Evaluating Deep Unlearning in Large Language Models Machine unlearning is a key requirement of many data protection regulations such as GDPR. Prior work on unlearning has mostly considered superficial unlearning tasks where a single or a few related pi...

1/2 New work on Evaluating Deep Unlearning in Large Language Models.

Paper: arxiv.org/abs/2410.15153

Unlearning specific facts in LLMs is challenging because the facts in LLMs can be deduced from each other. This work proposes a framework for deep unlearning of facts that are interrelated.

03.12.2024 14:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

What is happening?! Who is this? πŸ˜†

24.11.2024 00:22 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Hello BlueSky

24.11.2024 00:21 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@rsalakhu is following 7 prominent accounts