Maarten Sap's Avatar

Maarten Sap

@maartensap.bsky.social

Working on #NLProc for social good. Currently at LTI at CMU. πŸ³β€πŸŒˆ

1,711 Followers  |  216 Following  |  42 Posts  |  Joined: 08.11.2024
Posts Following

Posts by Maarten Sap (@maartensap.bsky.social)

Preview
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety Recent advances in AI agents capable of solving complex, everyday tasks, from scheduling to customer service, have enabled deployment in real-world settings, but their possibilities for unsafe behavio...

On Wednesday at 15:50pm (Room I; 15:30pm session):
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety arxiv.org/abs/2507.06134

(followed by poster session at 17:20pm)

24.02.2026 09:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Presenting two papers!

On Tuesday at 12:40pm (Room I; 12pm session):
1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning arxiv.org/abs/2508.07667

(followed by poster session at 1pm)

24.02.2026 09:11 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
(PDF) Agents of Chaos PDF | We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent... | Find, read and cite all the research you nee...

You can read more in the full paper:
www.researchgate.net/publication/...

There is also an interactive web that contains logs of the authentic interactions:
agentsofchaos.baulab.info

23.02.2026 23:47 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

In this amazing multidisciplinary collaboration, we report our early experience with the @openclaw-x.bsky.social ->

23.02.2026 23:32 β€” πŸ‘ 40    πŸ” 21    πŸ’¬ 1    πŸ“Œ 9
Preview
IASEAI - International Association for Safe and Ethical AI Building a global movement for safe and ethical AI. Join IASEAI to ensure AI systems operate safely and ethically, benefiting all of humanity.

On my way to Paris for the IASEAI conference www.iaseai.org/our-programs...! Who will be there?

22.02.2026 16:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

I believe so!

04.02.2026 00:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

oh thanks for catching that, I fixed this!

03.02.2026 14:12 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Hi Lingze! The form itself has a place where you can indicate your own interests, and then indicate alignment with existing topics. Since the full list of mentors isn't finalized, it's best not to contact faculty; they will reach out to you!

02.02.2026 18:55 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
CMU LTI Summer 2026 Internship Program Application We are looking for applicants for the Carnegie Mellon University Language Technology Institute's Summer 2026 "Language Technology for All" internship program. The main goal of this internship is to pr...

πŸš€ Apply to CMU LTI’s Summer 2026 β€œLanguage Technology for All” internship! πŸŽ“ Open to pre‑doctoral students new to language tech (non‑CS backgrounds welcome). πŸ”¬ 12–14 weeks in‑person in Pittsburgh β€” travel + stipend paid. πŸ’Έ Deadline: Feb 20, 11:59pm ET. Apply β†’ forms.gle/cUu8g6wb27Hs...

02.02.2026 15:41 β€” πŸ‘ 14    πŸ” 12    πŸ’¬ 2    πŸ“Œ 0
Preview
SoCon-NLPSI'26 | Home Natural Language Processing (NLP) has undergone a significant evolution, opening up the possibility of capturing high-level aspects of human communication. Key areas of interest include the pragmatics...

I'm excited to announce the Call for Papers for the Social Context (SoCon) and Integrating NLP and Psychology to Study Social Interactions (NLPSI) workshop, @ LREC '26 in Palma de Mallorca, Spain!

πŸ—“Deadline: February 16, 2026
🌐Website: socon-nlpsi.github.io
πŸ—“Workshop: May 12, 2026

13.01.2026 13:23 β€” πŸ‘ 15    πŸ” 7    πŸ’¬ 0    πŸ“Œ 0

I'm very excited about our new work which aims to model causes and effects on stories online! Narratives and stories are everywhere, so it's helpful to be able to understand how people use them in nuanced ways.

22.12.2025 09:20 β€” πŸ‘ 14    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image

How and when should LLM guardrails be deployed to balance safety and user experience?

Our #EMNLP2025 paper reveals that crafting thoughtful refusals rather than detecting intent is the key to human-centered AI safety.

πŸ“„ arxiv.org/abs/2506.00195
🧡[1/9]

20.10.2025 20:04 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
NeurIPS 2025 Workshop Mexico City PersonaNLP Welcome to the OpenReview homepage for NeurIPS 2025 Workshop Mexico City PersonaNLP

πŸ“£πŸ“£ Announcing the first PersonaLLM Workshop on LLM Persona Modeling.

If you work on persona driven LLMs, social cognition, HCI, psychology, cognitive science, cultural modeling, or evaluation, do not miss the chance to submit.

Submit here: openreview.net/group?id=Neu...

17.10.2025 00:57 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

I’m ✨ super excited and grateful ✨to announce that I'm part of the 2025 class of #PackardFellows (www.packard.org/2025fellows). The @packardfdn.bsky.social and this fellowship will allow me to explore exciting research directions towards culturally responsible and safe AI 🌍🌈

15.10.2025 13:05 β€” πŸ‘ 11    πŸ” 1    πŸ’¬ 1    πŸ“Œ 2
Post image

🚨New paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences?
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧡

14.10.2025 15:59 β€” πŸ‘ 12    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0

Oh yes we have a paper under submission! I'll ask Mikayla to email you :)

14.10.2025 13:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image Post image

Saplings take #COLM2025! Featuring Group lunch, amazing posters, and a panel with Yoshua Bengio!

14.10.2025 12:19 β€” πŸ‘ 16    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Grad App Aid β€” Queer in AI

We are launching our Graduate School Application Financial Aid Program (www.queerinai.com/grad-app-aid) for 2025-2026. We’ll give up to $750 per person to LGBTQIA+ STEM scholars applying to graduate programs. Apply at openreview.net/group?id=Que.... 1/5

09.10.2025 00:37 β€” πŸ‘ 7    πŸ” 9    πŸ’¬ 1    πŸ“Œ 0

I'm also giving a talk at #COLM2025 Social Simulation workshop (sites.google.com/view/social-...) on Unlocking Social Intelligence in AI, at 2:30pm Oct 10th!

06.10.2025 14:53 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Day 3 (Thu Oct 9), 11:00am–1:00pm, Poster Session 5

Poster #13: PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages by @kpriyanshu256.bsky.social and @devanshrjain.bsky.social

Poster #74: Fluid Language Model Benchmarking β€” led by @valentinhofmann.bsky.social

06.10.2025 14:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Day 2 (Wed Oct 8), 4:30–6:30pm, Poster Session 4

Poster #50: The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains β€” led by
Scott Geng

06.10.2025 14:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Day 1 (Tue Oct 7) 4:30-6:30pm, Poster Session 2

Poster #77: ALFA: Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning; led by
@stellali.bsky.social & @jiminmun.bsky.social

06.10.2025 14:51 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Day 1 (Tue Oct 7) 4:30-6:30pm, Poster Session 2

Poster #42: HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions; led by @nlpxuhui.bsky.social

06.10.2025 14:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Headed to #COLM2025 today! Here's five of our papers that were accepted, and when & where to catch them πŸ‘‡

06.10.2025 14:51 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

πŸ“’ New #COLM2025 paper πŸ“’

Standard benchmarks give every LLM the same questions. This is like testing 5th graders and college seniors with *one* exam! πŸ₯΄

Meet Fluid Benchmarking, a capability-adaptive eval method delivering lower variance, higher validity, and reduced cost.

🧡

16.09.2025 17:16 β€” πŸ‘ 41    πŸ” 10    πŸ’¬ 3    πŸ“Œ 1
Post image

That's a lot of people! Fall Sapling lab outing, welcoming our new postdoc Vasudha, and visitors Tze Hong and Chani! (just missing Jocelyn)

26.08.2025 17:53 β€” πŸ‘ 12    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I'm excited cause I'm teaching/coordinating a new unique class, where we teach new PhD students all the "soft" skills of research, incl. ideation, reviewing, presenting, interviewing, advising, etc.

Each lecture is taught by a different LTI prof! It takes a village! maartensap.com/11705/Fall20...

25.08.2025 18:01 β€” πŸ‘ 31    πŸ” 2    πŸ’¬ 2    πŸ“Œ 1

I've always seen people on laptops during talks, but it's possible it has increased.

I realized during lockdown that I drift to emails during Zoom talks, so I started knitting to pay better attention to those talks, and now I knit during IRL talks too (though sometimes I still peck at my laptop πŸ˜…)

22.08.2025 15:00 β€” πŸ‘ 14    πŸ” 1    πŸ’¬ 3    πŸ“Œ 0
Snippet of the Forbes article, with highlighted text.

A recent study by Allen Institute for AI (Ai2), titled β€œLet Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences,” found that refusal style mattered more than user intent. The researchers tested 3,840 AI query-response pairs across 480 participants, comparing direct refusals, explanations, redirection, partial compliance and full compliance.

Partial compliance, sharing general but not specific information, reduced dissatisfaction by over 50% compared to outright denial, making it the most effective safeguard.

β€œWe found that [start of highlight] direct refusals can cause users to have negative perceptions of the LLM: users consider these direct refusals significantly less helpful, more frustrating and make them significantly less likely to interact with the system in the future,” [end of highlight] Maarten Sap, AI safety lead at Ai2 and assistant professor at Carnegie Mellon University, told me. β€œI do not believe that model welfare is a well-founded direction or area to care about.”

Snippet of the Forbes article, with highlighted text. A recent study by Allen Institute for AI (Ai2), titled β€œLet Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences,” found that refusal style mattered more than user intent. The researchers tested 3,840 AI query-response pairs across 480 participants, comparing direct refusals, explanations, redirection, partial compliance and full compliance. Partial compliance, sharing general but not specific information, reduced dissatisfaction by over 50% compared to outright denial, making it the most effective safeguard. β€œWe found that [start of highlight] direct refusals can cause users to have negative perceptions of the LLM: users consider these direct refusals significantly less helpful, more frustrating and make them significantly less likely to interact with the system in the future,” [end of highlight] Maarten Sap, AI safety lead at Ai2 and assistant professor at Carnegie Mellon University, told me. β€œI do not believe that model welfare is a well-founded direction or area to care about.”

We have been studying these questions of how models should refuse in our recent paper accepted to EMNLP Findings (arxiv.org/abs/2506.00195) led by my wonderful PhD student
@mingqian-zheng.bsky.social

22.08.2025 13:00 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0