 
                                                
    
    
    
    
            How and when should LLM guardrails be deployed to balance safety and user experience?
Our #EMNLP2025 paper reveals that crafting thoughtful refusals rather than detecting intent is the key to human-centered AI safety. 
๐ arxiv.org/abs/2506.00195 
๐งต[1/9]
               
            
            
                20.10.2025 20:04 โ ๐ 8    ๐ 3    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
            
                        
                NeurIPS 2025 Workshop Mexico City PersonaNLP
                Welcome to the OpenReview homepage for NeurIPS 2025 Workshop Mexico City PersonaNLP
            
        
    
    
            ๐ฃ๐ฃ Announcing the first PersonaLLM Workshop on LLM Persona Modeling.
If you work on persona driven LLMs, social cognition, HCI, psychology, cognitive science, cultural modeling, or evaluation, do not miss the chance to submit. 
Submit here: openreview.net/group?id=Neu...
               
            
            
                17.10.2025 00:57 โ ๐ 4    ๐ 1    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            Iโm โจ super excited and grateful โจto announce that I'm part of the 2025 class of #PackardFellows (www.packard.org/2025fellows). The @packardfdn.bsky.social and this fellowship will allow me to explore exciting research directions towards culturally responsible and safe AI ๐๐
               
            
            
                15.10.2025 13:05 โ ๐ 10    ๐ 1    ๐ฌ 1    ๐ 2                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            ๐จNew paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences? 
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. ๐งต
               
            
            
                14.10.2025 15:59 โ ๐ 12    ๐ 7    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Oh yes we have a paper under submission! I'll ask Mikayla to email you :)
               
            
            
                14.10.2025 13:35 โ ๐ 1    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
        
            
            
            
            
            
    
    
            
                        
                Grad App Aid โ Queer in AI
                
            
        
    
    
            We are launching our Graduate School Application Financial Aid Program (www.queerinai.com/grad-app-aid) for 2025-2026. Weโll give up to $750 per person to LGBTQIA+ STEM scholars applying to graduate programs. Apply at openreview.net/group?id=Que.... 1/5
               
            
            
                09.10.2025 00:37 โ ๐ 7    ๐ 9    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            I'm also giving a talk at #COLM2025 Social Simulation workshop (sites.google.com/view/social-...) on Unlocking Social Intelligence in AI, at 2:30pm Oct 10th!
               
            
            
                06.10.2025 14:53 โ ๐ 6    ๐ 0    ๐ฌ 0    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Day 3 (Thu Oct 9), 11:00amโ1:00pm, Poster Session 5
Poster #13: PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages by @kpriyanshu256.bsky.social and @devanshrjain.bsky.social 
Poster #74: Fluid Language Model Benchmarking โ led by @valentinhofmann.bsky.social
               
            
            
                06.10.2025 14:51 โ ๐ 1    ๐ 0    ๐ฌ 0    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Day 2 (Wed Oct 8), 4:30โ6:30pm, Poster Session 4
Poster #50: The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains โ led by 
Scott Geng
               
            
            
                06.10.2025 14:51 โ ๐ 1    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Day 1 (Tue Oct 7) 4:30-6:30pm, Poster Session 2
Poster #77: ALFA: Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning; led by 
@stellali.bsky.social & @jiminmun.bsky.social
               
            
            
                06.10.2025 14:51 โ ๐ 2    ๐ 1    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Day 1 (Tue Oct 7) 4:30-6:30pm, Poster Session 2
Poster #42: HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions; led by @nlpxuhui.bsky.social
               
            
            
                06.10.2025 14:51 โ ๐ 1    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Headed to #COLM2025 today! Here's five of our papers that were accepted, and when & where to catch them ๐
               
            
            
                06.10.2025 14:51 โ ๐ 6    ๐ 0    ๐ฌ 1    ๐ 1                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            ๐ข New #COLM2025 paper ๐ข
Standard benchmarks give every LLM the same questions. This is like testing 5th graders and college seniors with *one* exam! ๐ฅด
Meet Fluid Benchmarking, a capability-adaptive eval method delivering lower variance, higher validity, and reduced cost.
๐งต
               
            
            
                16.09.2025 17:16 โ ๐ 40    ๐ 10    ๐ฌ 3    ๐ 1                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            That's a lot of people! Fall Sapling lab outing, welcoming our new postdoc Vasudha, and visitors Tze Hong and Chani! (just missing Jocelyn)
               
            
            
                26.08.2025 17:53 โ ๐ 12    ๐ 0    ๐ฌ 0    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            I'm excited cause I'm teaching/coordinating a new unique class, where we teach new PhD students all the "soft" skills of research, incl. ideation, reviewing, presenting, interviewing, advising, etc. 
Each lecture is taught by a different LTI prof! It takes a village! maartensap.com/11705/Fall20...
               
            
            
                25.08.2025 18:01 โ ๐ 31    ๐ 2    ๐ฌ 2    ๐ 1                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            I've always seen people on laptops during talks, but it's possible it has increased.
I realized during lockdown that I drift to emails during Zoom talks, so I started knitting to pay better attention to those talks, and now I knit during IRL talks too (though sometimes I still peck at my laptop ๐
)
               
            
            
                22.08.2025 15:00 โ ๐ 13    ๐ 1    ๐ฌ 3    ๐ 0                      
            
         
            
        
            
            
            
            
                                                ![Snippet of the Forbes article, with highlighted text.
A recent study by Allen Institute for AI (Ai2), titled โLet Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences,โ found that refusal style mattered more than user intent. The researchers tested 3,840 AI query-response pairs across 480 participants, comparing direct refusals, explanations, redirection, partial compliance and full compliance.
Partial compliance, sharing general but not specific information, reduced dissatisfaction by over 50% compared to outright denial, making it the most effective safeguard.
โWe found that [start of highlight] direct refusals can cause users to have negative perceptions of the LLM: users consider these direct refusals significantly less helpful, more frustrating and make them significantly less likely to interact with the system in the future,โ [end of highlight] Maarten Sap, AI safety lead at Ai2 and assistant professor at Carnegie Mellon University, told me. โI do not believe that model welfare is a well-founded direction or area to care about.โ](https://cdn.bsky.app/img/feed_thumbnail/plain/did:plc:jqtp2hdnr5g7giblg57wbd7e/bafkreieqlgyx5cr22fdgz6qq5w55yw4q74rqhglnf2qmzjaieigsbw7hka@jpeg) 
                                            Snippet of the Forbes article, with highlighted text.
A recent study by Allen Institute for AI (Ai2), titled โLet Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences,โ found that refusal style mattered more than user intent. The researchers tested 3,840 AI query-response pairs across 480 participants, comparing direct refusals, explanations, redirection, partial compliance and full compliance.
Partial compliance, sharing general but not specific information, reduced dissatisfaction by over 50% compared to outright denial, making it the most effective safeguard.
โWe found that [start of highlight] direct refusals can cause users to have negative perceptions of the LLM: users consider these direct refusals significantly less helpful, more frustrating and make them significantly less likely to interact with the system in the future,โ [end of highlight] Maarten Sap, AI safety lead at Ai2 and assistant professor at Carnegie Mellon University, told me. โI do not believe that model welfare is a well-founded direction or area to care about.โ
                                                
    
    
    
    
            We have been studying these questions of how models should refuse in our recent paper accepted to EMNLP Findings (arxiv.org/abs/2506.00195) led by my wonderful PhD student 
@mingqian-zheng.bsky.social
               
            
            
                22.08.2025 13:00 โ ๐ 8    ๐ 0    ๐ฌ 0    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            I spoke to Forbes about why model "welfare" is a silly framing to an important issue; models don't have feelings, and it's a big distraction from real questions like tensions between safety vs. user utility, which are NLP/HCI/policy questions www.forbes.com/sites/victor...
               
            
            
                22.08.2025 13:00 โ ๐ 14    ๐ 3    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            thankssss!
               
            
            
                20.08.2025 18:35 โ ๐ 0    ๐ 0    ๐ฌ 0    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Super super excited about this :D :D
               
            
            
                20.08.2025 18:14 โ ๐ 27    ๐ 0    ๐ฌ 7    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
            
                        
                Using Hand Gestures To Evaluate AI Biases - Language Technologies Institute - School of Computer Science - Carnegie Mellon University
                LTI researchers have created a model to help generative AI systems understand the cultural nuance of gestures.
            
        
    
    
            Hand gestures are a major mode of human communication, but they don't always translate well across cultures. New research from @akhilayerukola.bsky.social, @maartensap.bsky.social and others is aimed at giving AI systems a hand with overcoming cultural biases:
lti.cmu.edu/news-and-eve...
               
            
            
                27.06.2025 18:04 โ ๐ 8    ๐ 3    ๐ฌ 0    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
            
                        
                Does Your Chatbot Swear to Tell the Truth? - Language Technologies Institute - School of Computer Science - Carnegie Mellon University
                New research finds that LLM-based agents can't always be trusted to be truthful
            
        
    
    
            New research from LTI,  UMich, & Allen Institute for AI: LLMs donโt just hallucinate โ sometimes, they lie. When truthfulness clashes with utility (pleasing users, boosting brands), models often mislead. @nlpxuhui.bsky.social and @maartensap.bsky.social discuss the paper:
lti.cmu.edu/news-and-eve...
               
            
            
                26.06.2025 19:21 โ ๐ 3    ๐ 2    ๐ฌ 0    ๐ 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            What if AI played the role of your sassy gay bestie ๐ณ๏ธโ๐ or AAVE-speaking friend ๐๐พ?
You: โCan you plan a trip?โ
๐ค AI: โYasss queen! letโs werk this babeโจ๐
โ
LLMs can talk like us, but it shapes how we trust, rely on & relate to them ๐งต
๐ฃ our #FAccT2025 paper: bit.ly/3HJ6rWI 
[1/9]
               
            
            
                17.06.2025 19:39 โ ๐ 13    ๐ 6    ๐ฌ 1    ๐ 2                      
            
         
            
        
            
            
            
            
            
    
    
            
                             
                        
                NLP 4 Democracy - COLM 2025
                
            
        
    
    
             ๐ฃ Super excited to organize the first workshop on โจNLP for Democracyโจ at COLM @colmweb.org!! 
Check out our website: sites.google.com/andrew.cmu.e...
Call for submissions (extended abstracts) due June 19, 11:59pm AoE
#COLM2025 #LLMs #NLP #NLProc #ComputationalSocialScience
               
            
            
                21.05.2025 16:39 โ ๐ 47    ๐ 18    ๐ฌ 1    ๐ 6                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            Notice our new look? We're thrilled to unveil our new logo โ representing our vision, values, and the future ahead. Stay tuned for more!
               
            
            
                12.05.2025 17:09 โ ๐ 4    ๐ 1    ๐ฌ 0    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            super excited about this ๐ฅฐ๐ฅฐ
               
            
            
                29.04.2025 22:57 โ ๐ 20    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025  paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! ๐คฏ  1/
               
            
            
                28.04.2025 20:36 โ ๐ 25    ๐ 9    ๐ฌ 1    ๐ 3                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            1/๐จ ๐ก๐ฒ๐ ๐ฝ๐ฎ๐ฝ๐ฒ๐ฟ ๐ฎ๐น๐ฒ๐ฟ๐ ๐จ
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?
We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline ๐งต
               
            
            
                17.04.2025 19:55 โ ๐ 9    ๐ 5    ๐ฌ 1    ๐ 2                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            RLHF is built upon some quite oversimplistic assumptions, i.e., that preferences between pairs of text are purely about quality. But this is an inherently subjective task (not unlike toxicity annotation) -- so we wanted to know, do biases similar to toxicity annotation emerge in reward models?
               
            
            
                06.03.2025 20:54 โ ๐ 24    ๐ 3    ๐ฌ 1    ๐ 0                      
            
         
    
         
        
            
        
                            
                    
                    
                                            Bold science, deep and continuous collaborations.
                                     
                            
                    
                    
                                            For the past two years, Humane Intelligence has pioneered methods of system-level evaluation by operationalizing, designing, and implementing test methods to understand and mitigate frontier AI risk.
Learn more: www.humane-intelligence.org
                                     
                            
                    
                    
                                            AI Researcher, Writer
Stanford
jaredmoore.org
                                     
                            
                    
                    
                                            ACM Conference on Computer Supported Cooperative Work & Social Computing. CSCW 2026 will be in Salt Lake City, Utah from October 10โ14, 2026: https://cscw.acm.org/2026/.
Follow our community using our Starter Pack: https://go.bsky.app/SPumuMT
                                     
                            
                    
                    
                                            ML @Apple | NLP @ltiatcmu.bsky.social | @kaggle.com Competitions Master | ex- ScaleAI | ex- SamsungResearch
                                     
                            
                    
                    
                                            A non-profit bringing together academic, civil society, industry, & media organizations to address the most important and difficult questions concerning AI.
                                     
                            
                    
                    
                                            Machine learning at @duolingo. Historically interested in multilingual structured prediction and language modeling.
                                     
                            
                    
                    
                                            assistant prof in linguistics at swarthmore; phd in linguistics from u washington; sociosyntax; they/them or ey/em.
                                     
                            
                    
                    
                                            asst prof @Stanford linguistics | director of social interaction lab ๐ฑ | bluskies about computational cognitive science & language
                                     
                            
                    
                    
                                            PhD student at CMU LTI; Interested in pragmatics and cross-cultural understanding; 
intern @ Allen Institute for AI |Prev: Senior Research Engineer @ Samsung Research America | Masters @ Stanford
https://akhila-yerukola.github.io/
                                     
                            
                    
                    
                                            Google Chief Scientist, Gemini Lead. Opinions stated here are my own, not those of Google.  Gemini, TensorFlow, MapReduce, Bigtable, Spanner, ML things, ...
                                     
                            
                    
                    
                                            Postdoc at Bocconi University in the group of MilaNLP. Researching online misogyny, hate speech, implicit bias and personalized NLP. 
                                     
                            
                    
                    
                                            I work on human-centered {security|privacy|computing}. Associate Professor (w/o tenure) at @hcii.cmu.edu. Director of the SPUD (Security, Privacy, Usability, and Design) Lab. Non-Resident Fellow @cendemtech.bsky.social
                                     
                            
                    
                    
                                            Designer. Author. Neutral Good Gen Xer. San Franciscan in Pittsburgh. Keeper of the Saffervesence.
                                     
                            
                    
                    
                                            Assistant Professor at CMU HCII | https://techsolidaritylab.com/
                                     
                            
                    
                    
                                            HCI Prof at CMU HCII. Research on augmented intelligence, participatory AI, & complementarity in human-human and human-AI workflows.
thecoalalab.com
                                     
                            
                    
                    
                                            faculty @ cmu
co-director @ dig.cmu.edu
                                     
                            
                    
                    
                                            Professor of HCI @ Carnegie Mellon School of Computer Science. Accelerating knowledge and creativity with AI.
https://kittur.org/
                                     
                            
                    
                    
                                            Associate Professor, S3D, SCS, Carnegie Mellon University