Building a more robust model definitely helps. But it cannot be the only line of defense. You have to sandbox the model, just like we sandbox OS processes to contain the damage of a memory corruption vuln.
06.05.2025 04:56 β π 1 π 0 π¬ 0 π 0
Prompt injection attacks are the AI version of stack smashing from the 90s. Yet, most efforts are trying to defend against this by hoping to build better robust models (aka, computer programs). Do you see the issue here?
06.05.2025 04:56 β π 1 π 0 π¬ 1 π 0
SAGAI'25 @ IEEE S&P
Goal
The workshop will investigate the safety, security, and privacy of GenAI agents from a system design perspective. We believe that this new category of important and critical system components req...
SAGAI'25 will investigate the safety, security, and privacy of GenAI agents from a system design perspective. We are experimenting with a new "Dagstuhl" like seminar with invited speakers and discussion. Really excited about this workshop at IEEE Security and Privacy Symposium.
31.03.2025 19:32 β π 4 π 2 π¬ 1 π 1
CSE 291: LLM Security
I'm teaching a grad course on LLM Security at UCSD. In addition to academic papers, I've included material from the broader community.
I'm looking for 1 good article on LLM agent security. Send me recs!
cseweb.ucsd.edu/~efernandes/...
02.01.2025 16:11 β π 3 π 1 π¬ 0 π 0
FEEL THE AGI!
30.12.2024 15:33 β π 1 π 0 π¬ 0 π 0
Is there a GenAI service (or services) that will allow me to upload an image and then specify some text that modifies the image, and get back a new image with those modifications? Eg, say I upload a picture of spiderman in a seated position with text "convert this spiderman into standing position"
25.12.2024 15:14 β π 0 π 0 π¬ 0 π 0
Most work has focused on privesc for some "forbidden knowledge" and IMO this has muddied JB a LOT. If you ignore the "make me a bomb" type issues, you will realize there's a lot more that can be done with JB attacks.
14.12.2024 17:41 β π 1 π 0 π¬ 0 π 0
I think that I've finally come to a reasonable definition of GenAI jailbreaking. A jailbreak is a privilege escalation. It allows the attacker to force the model to undertake arbitrary instructions, regardless of whatever safeguards might be in place.
14.12.2024 17:41 β π 0 π 0 π¬ 1 π 0
I will go one step further. To become a bike lane/traffic planner, you have to ride the bike lane yourself.
09.12.2024 23:56 β π 2 π 0 π¬ 0 π 0
She Was a Russian Socialite and Influencer. Cops Say Sheβs a Crypto Laundering Kingpin
Western authorities say theyβve identified a network that found a new way to clean drug gangsβ dirty cash. WIRED gained exclusive access to the investigation.
NEW: For the last few months, officials at Britainβs NCA have explained to me how they discovered and disrupted two massive Russian money laundering rings.
The networks have moved billions each year andβunusuallyβhave been caught swapping cash for crypto with drugs gangs
π§΅ A wild thread...
04.12.2024 15:47 β π 276 π 123 π¬ 8 π 11
its got that 70s look
03.12.2024 16:58 β π 1 π 0 π¬ 0 π 0
Banned Books: Analysis of Censorship on Amazon.com - The Citizen Lab
We analyze the system Amazon deploys on the US βamazon.comβ storefront to restrict shipments of certain products to specific regions. We found 17,050 products that Amazon restricted from being shipped...
π’ Our latest report reveals that the US storefront of Amazon uses a system to restrict shipments of certain products. We found 17k+ products that were restricted from being shipped to specific regions, with the most common type of product being books π.
citizenlab.ca/2024/11/anal...
25.11.2024 20:37 β π 39 π 22 π¬ 2 π 7
My Christmas break plan is to learn Rust. Any pointers to resources that you found particularly useful?
21.11.2024 21:29 β π 6 π 0 π¬ 4 π 0
Meta Finally Breaks Its Silence on Pig Butchering
The company gave details for the first time on its approach to combating organized criminal networks behind the devastating scams.
STORY with @lhn.bsky.social: Meta is speaking out about pig butchering scams for the first timeβit says it has removed 2 million pig butchering accounts this year.
In one instance, OpenAI alerted Meta to criminals using ChatGPT to generate comments used in scams
21.11.2024 18:21 β π 37 π 18 π¬ 0 π 2
@mattburgess1.bsky.social has covered AI security stuff.
20.11.2024 16:49 β π 2 π 0 π¬ 0 π 0
I will be adopting this terminology as well.
18.11.2024 19:47 β π 3 π 0 π¬ 0 π 0
Charlie Murphy
My postdoc Charlie Murphy is on the academic job market this fall. He's doing really hard technical work on building constraint solvers and synthesis engines. You should interview him
pages.cs.wisc.edu/~tcmurphy4/
16.11.2024 07:50 β π 15 π 8 π¬ 1 π 0
New idea for Anthropic's computer use agent. Task it with going thru my Twitter, finding those folks here, and following them.
16.11.2024 02:49 β π 2 π 0 π¬ 1 π 0
first thing I did after joining this new twitter was follow a bunch of PL folks. And some security folks.
16.11.2024 02:46 β π 2 π 0 π¬ 0 π 0
Associate prof at Uppsala University in Programming languages
Swedish econ through an MMT lens - MMT for Sweden
Born at 335 ppm, living in Stockholm πΈπͺ previously π¬π§ π©πͺ π¨π
Private account, he/him, πΉ πΊπ¦ π£
https://jpolitz.github.io
prof @ucdavis; privacy + measurement researcher; outdoor enthusiast; anda-shami lover
π§βπ¬ @cispa.de
β€οΈ Usable Security and Privacy
π Passwords and User Authentication
π΅οΈ Transparency and Privacy Controls
Director of Cybersecurity @eff.org
Co-founder of @stopstalkerware.bsky.social
These opinions are my own, not my employersβ
I did a TED talk once
Systems+Security faculty @GWTweets CS. Resiliency, OS, Networks, CPS, Real-Time. Photography. #AnnotatedEquations. He/him. Opinions/RTs personal.
Mom, foodie, traveller, computer scientist
Professor at Northeastern University / www.aanjhan.com
Assistant Professor at Purdue ECE. I do research on formal methods and program verification. PL/SE/FM
Professor at the University of Washington, Paul G. Allen School of Computer Science & Engineering @uwcse.bsky.social
Working on cryptography, theoretical computer science, and computer security.
https://homes.cs.washington.edu/~tessaro/
ER doc. Hacker. Security Researcher @UCSD.
Faculty@CISPA; Empirical Security
Tenured Faculty at the CISPA Helmholtz Center for Information Security | https://svenbugiel.de/
Lecturer at the University of Sydney, Australia. ΰ΄Άΰ΅ΰ΄°ΰ΅ΰ΄¦ΰ΅ΰ΄΅ΰ΄Ώ's Dad. I work in the junction between SE and CySec. Interested in Program Analysis, Mutation Analysis, Repair, Grammar Inference, Generation and Parsing
https://rahul.gopinath.org
Associate Professor in EECS at
@MIT | Principal Scientist at @Databricks | Founding Advisor at @mosaicml | Programming Systems | Neural Networks | Approximate Computing
Senior Director of Research. Black Hat Review Board Member (AI, ML, and DS track lead) and International public speaker. I focus on emerging technologies and risks at the intersection of humanity and tech. Hype Critic. My writing: https://perilous.tech