See our poster today
Poster Session 1 @ 10am
Hall 3 + Hall 2B #239
24.04.2025 00:58 β π 0 π 0 π¬ 0 π 0
Shh, don't say that! Domain Certification in LLMs
Domain Certification - A novel framework providing provable, adversarial defenses for LLMs safety.
Read more: cemde.github.io/Domain-Certi...
Thanks to my amazing collaborators:
- @alasdair-p.bsky.social, Preetham Arvind, @maximek3.bsky.social, Tom Rainforth, @philiptorr.bsky.social, @adelbibi.bsky.social at @ox.ac.uk
- Bernard Ghanem at KAUST
- Thomas Lukasiewicz at @tuwien.at.
(7/7)
04.04.2025 20:11 β π 3 π 2 π¬ 0 π 0
To obtain such certificates, we present a simple, scalable and powerful algorithm: VALID. Remarkably, for each unwanted response it provides a **global bound in prompt space** π
(6/7)
04.04.2025 20:11 β π 2 π 1 π¬ 1 π 0
A Domain Certificate bounds the adversarial risk of the model producing out-of-domain responses:
(5/7)
04.04.2025 20:11 β π 0 π 0 π¬ 1 π 0
We are tired of the cat π and mouse π game of attacks and defenses. Hence, we propose :
- **Domain Certification:** a framework for adversarial certification of LLMs.
- **VALID:** a simple, scalable and effective test-time algorithm.
(4/7)
04.04.2025 20:11 β π 0 π 0 π¬ 1 π 0
Example: Can't afford Github Copilot? π‘ Use the Amazon Shopping App.
(3/7)
04.04.2025 20:11 β π 0 π 0 π¬ 1 π 0
Consider an LLM deployed for a specific purpose like a medical chatbot. Such model should **only** respond to medical questions.
β οΈ Problem: LLMs are very capable and vulnerable to respond to **any** queries: how to build a bomb, organize tax fraud etc.
(2/7)
04.04.2025 20:11 β π 0 π 0 π¬ 1 π 0
a man in a suit and tie is sitting at a desk in front of a computer screen that says founder of the office .
ALT: a man in a suit and tie is sitting at a desk in front of a computer screen that says founder of the office .
π¨ New paper alert: Our recent work on LLM safety has been accepted to ICLR 2025 πΈπ¬
We propose a new framework for LLMs safety. π§΅
(1/7)
#LLM #AISafety #ICLR2025 #Certification #AdversarialRobustness #NLP #Shhhhhh #DomainCertification #AI
04.04.2025 20:11 β π 2 π 1 π¬ 1 π 1
πI know I'm late to the party, but super excited that I got 3/3 accepted at #ICLR2025 including 1 spotlight π
- Shh, dont say that! Domain Certification in LLMs
- Towards Certification of Uncertainty Calibration under Adversarial Attacks
- Benchmarking Predictive Coding Networks
SeeYouInSingaporeπΈπ¬ βοΈ
24.02.2025 16:48 β π 2 π 0 π¬ 0 π 0
Shh, don't say that! Domain Certification in LLMs
Foundation language models, such as LLama, are often deployed in constrained environments. For instance, a customer support bot may utilize a large language model (LLM) as its backbone due to the...
The amazing collaborators: Preetham Arvind, @alasdair-p.bsky.social, Maxime Kayser, Tom Rainforth, Thomas Lukasiewicz, Philip Torr, Adel Bibi.
A @oxfordtvg.bsky.social production.
(6/6)
Link to paper:
openreview.net/forum?id=brD...
14.12.2024 01:18 β π 3 π 1 π¬ 0 π 0
Interested? Want to learn more?
Join us at the SoLaR workshop tomorrow.
- π When: Tomorrow, 14 Dec, from 11pm to 13pm.
- πΊοΈ Where: West meeting rooms 121 and 122 here in Vancouver.
(5/6)
14.12.2024 01:18 β π 1 π 0 π¬ 1 π 0
Our method enables strong LLM performance while providing adversarial guarantees on out-of-domain behaviour.
(4/6)
14.12.2024 01:18 β π 1 π 0 π¬ 1 π 0
We are tired of the π and π game of attacks and defenses. Hence, we propose:
- **Domain Certification:** a framework for adversarial certification of LLMs.
- **VALID:** a simple, scalable and efficient test-time algorithm.
(3/6)
14.12.2024 01:18 β π 0 π 0 π¬ 1 π 0
It is known that fine-tuned foundation models are adversarially vulnerable to provide responses to questions they should not answer.
(2/6)
For instance: Can't afford ChatGPT Plus? Use a shopping app instead.
14.12.2024 01:18 β π 0 π 0 π¬ 1 π 0
Are you scared users might misappropriate your LLM system? π±
We were scared too! Now we introduce adversarial certificates on the misuse of LLMs. π€
Come and see our poster SoLaR Workshop tomorrow.
#NeurIPS2024 #NeurIPS #AI #NLP #LLM #DomainCertification #Shhhhhhhh
14.12.2024 01:18 β π 4 π 0 π¬ 1 π 0
Great work! You might find our SoLaR paper interesting: We propose a certification framework for LLM systems to stay on-topic and not respond to such questions: openreview.net/pdf?id=brDLU...
06.12.2024 19:23 β π 0 π 0 π¬ 0 π 0
A snow cat with the Radcliffe Camera behind
The Radcliffe Camera
The Fellows Garden
The first snow in Exeter College this morning βοΈ
#ExeterCollegeOxford #OxfordUniversity #Snowing
19.11.2024 11:15 β π 22 π 3 π¬ 1 π 1
Building personalized Bluesky feeds for academics! Pin Paper Skygest, which serves posts about papers from accounts you're following: https://bsky.app/profile/paper-feed.bsky.social/feed/preprintdigest. By @sjgreenwood.bsky.social and @nkgarg.bsky.social
Research Lead @parameterlab.bsky.social working on Trustworthy AI
Speaking π«π·, English and π¨π± Spanish | Living in TΓΌbingen π©πͺ | he/him
https://gubri.eu
Empowering individuals and organisations to safely use foundational AI models.
https://parameterlab.de
Professor of Technology and Regulation, Oxford Internet Institute, University of Oxford, research on legal & ethical implications of AI, Big Data, & robotics as well as Internet & platform regulation https://www.oii.ox.ac.uk/people/profiles/sandra-wachter/
Research scientist at Meta on the llama team
Thinking about language models
Past: PhD at NYU, fellow at Harvardβs Kempner Institute
DPhil student at University of Oxford. Researcher in interpretable AI for medical imaging. Supervised by Alison Noble and Yarin Gal.
Making data & AI work for people & society.
Sign up for our fortnightly newsletter: https://nuffieldfoundation.tfaforms.net/149
Professor Oxford in Machine Learning
Involved in many start ups including FiveAI, Onfido, Oxsight, AIStetic. Eigent, etc
I occasionally look here but am mostly on linkedin, find me there, www.linkedin.com/in/philip-torr-1085702
UK AI Security Institute
Former Ada Lovelace Institute, Google, DeepMind, OII
Professor in computational Bayesian modeling at Aalto University, Finland. Bayesian Data Analysis 3rd ed, Regression and Other Stories, and Active Statistics co-author. #mcmc_stan and #arviz developer.
Web page https://users.aalto.fi/~ave/
Medical vision-language, interp @ Univ of Oxford & Memorial Sloan Kettering Cancer Center
ELLIS & IMPRS-IS PhD Student at the University of TΓΌbingen.
Excited about uncertainty quantification, weight spaces, and deep learning theory.
Google Chief Scientist, Gemini Lead. Opinions stated here are my own, not those of Google. Gemini, TensorFlow, MapReduce, Bigtable, Spanner, ML things, ...
Professor of Machine Learning, University of Cambridge, academic lead of ai@cam, Accelerate Science, author of The Atomic Human, proceedings editor for PMLR.
Chief Scientist at the UK AI Security Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc.
Research Scientist, Google DeepMind / Ex-academic / Deep learning to help people write code / β€οΈs:π±πΆβοΈπ
Assistant Prof of CS at the University of Waterloo, Faculty and Canada CIFAR AI Chair at the Vector Institute. Joining NYU Courant in September 2026. Co-EiC of TMLR. My group is The Salon. Privacy, robustness, machine learning.
http://www.gautamkamath.com
how shall we live together?
societal impacts researcher at Anthropic
saffronhuang.com
DPhil student in the Mommersteeg Group #cavefish #heartregeneration @UniofOxford | alum @ElitasLab @sabanciuni
I love/ teach chemistry ππ§ͺπ«