C Emde's Avatar

C Emde

@cemde.bsky.social

ML Research Scientist at Oxford. DPhil student @compscioxford.bsky.social and TVGOxford. Ex ML Researcher @ Wise. Deep Learning | ML Robustness | AI Safety | Uncertainty Quantification

75 Followers  |  145 Following  |  16 Posts  |  Joined: 15.10.2024  |  2.0906

Latest posts by cemde.bsky.social on Bluesky

Preview
Absence of a prolonged macrophage and B cell response inhibits heart regeneration in the Mexican cavefish A balanced immune response after cardiac injury is crucial to successful heart regeneration, but knowledge of what distinguishes a regenerative from a scarring response is still limited. The Mexican c...

Excited to share our preprint! We show that sustained macrophage and B cell responses are essential for heart regeneration in Mexican cavefish, helping uncover why surface fish heal but cavefish scar πŸ«€πŸŸ. Check out the full story:
www.biorxiv.org/content/10.1...

02.05.2025 18:17 β€” πŸ‘ 22    πŸ” 7    πŸ’¬ 1    πŸ“Œ 1

See our poster today

Poster Session 1 @ 10am

Hall 3 + Hall 2B #239

24.04.2025 00:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Shh, don't say that! Domain Certification in LLMs Domain Certification - A novel framework providing provable, adversarial defenses for LLMs safety.

Read more: cemde.github.io/Domain-Certi...

Thanks to my amazing collaborators:
- @alasdair-p.bsky.social, Preetham Arvind, @maximek3.bsky.social, Tom Rainforth, @philiptorr.bsky.social, @adelbibi.bsky.social at @ox.ac.uk
- Bernard Ghanem at KAUST
- Thomas Lukasiewicz at @tuwien.at.

(7/7)

04.04.2025 20:11 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

To obtain such certificates, we present a simple, scalable and powerful algorithm: VALID. Remarkably, for each unwanted response it provides a **global bound in prompt space** πŸš€

(6/7)

04.04.2025 20:11 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

A Domain Certificate bounds the adversarial risk of the model producing out-of-domain responses:

(5/7)

04.04.2025 20:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We are tired of the cat 🐈 and mouse 🐁 game of attacks and defenses. Hence, we propose :
- **Domain Certification:** a framework for adversarial certification of LLMs.
- **VALID:** a simple, scalable and effective test-time algorithm.

(4/7)

04.04.2025 20:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Example: Can't afford Github Copilot? πŸ’‘ Use the Amazon Shopping App.

(3/7)

04.04.2025 20:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Consider an LLM deployed for a specific purpose like a medical chatbot. Such model should **only** respond to medical questions.

⚠️ Problem: LLMs are very capable and vulnerable to respond to **any** queries: how to build a bomb, organize tax fraud etc.

(2/7)

04.04.2025 20:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
a man in a suit and tie is sitting at a desk in front of a computer screen that says founder of the office . ALT: a man in a suit and tie is sitting at a desk in front of a computer screen that says founder of the office .

🚨 New paper alert: Our recent work on LLM safety has been accepted to ICLR 2025 πŸ‡ΈπŸ‡¬

We propose a new framework for LLMs safety. 🧡

(1/7)

#LLM #AISafety #ICLR2025 #Certification #AdversarialRobustness #NLP #Shhhhhh #DomainCertification #AI

04.04.2025 20:11 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1

πŸŽ‰I know I'm late to the party, but super excited that I got 3/3 accepted at #ICLR2025 including 1 spotlight πŸ”Ž
- Shh, dont say that! Domain Certification in LLMs
- Towards Certification of Uncertainty Calibration under Adversarial Attacks
- Benchmarking Predictive Coding Networks
SeeYouInSingaporeπŸ‡ΈπŸ‡¬ ✈️

24.02.2025 16:48 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Shh, don't say that! Domain Certification in LLMs Foundation language models, such as LLama, are often deployed in constrained environments. For instance, a customer support bot may utilize a large language model (LLM) as its backbone due to the...

The amazing collaborators: Preetham Arvind, @alasdair-p.bsky.social, Maxime Kayser, Tom Rainforth, Thomas Lukasiewicz, Philip Torr, Adel Bibi.

A @oxfordtvg.bsky.social production.

(6/6)

Link to paper:
openreview.net/forum?id=brD...

14.12.2024 01:18 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Interested? Want to learn more?

Join us at the SoLaR workshop tomorrow.
- πŸ•š When: Tomorrow, 14 Dec, from 11pm to 13pm.
- πŸ—ΊοΈ Where: West meeting rooms 121 and 122 here in Vancouver.

(5/6)

14.12.2024 01:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Our method enables strong LLM performance while providing adversarial guarantees on out-of-domain behaviour.

(4/6)

14.12.2024 01:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We are tired of the 🐈 and 🐁 game of attacks and defenses. Hence, we propose:

- **Domain Certification:** a framework for adversarial certification of LLMs.
- **VALID:** a simple, scalable and efficient test-time algorithm.

(3/6)

14.12.2024 01:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

It is known that fine-tuned foundation models are adversarially vulnerable to provide responses to questions they should not answer.

(2/6)

For instance: Can't afford ChatGPT Plus? Use a shopping app instead.

14.12.2024 01:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Are you scared users might misappropriate your LLM system? 😱

We were scared too! Now we introduce adversarial certificates on the misuse of LLMs. πŸ€–

Come and see our poster SoLaR Workshop tomorrow.

#NeurIPS2024 #NeurIPS #AI #NLP #LLM #DomainCertification #Shhhhhhhh

14.12.2024 01:18 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Great work! You might find our SoLaR paper interesting: We propose a certification framework for LLM systems to stay on-topic and not respond to such questions: openreview.net/pdf?id=brDLU...

06.12.2024 19:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
A snow cat with the Radcliffe Camera behind

A snow cat with the Radcliffe Camera behind

The Radcliffe Camera

The Radcliffe Camera

The Fellows Garden

The Fellows Garden

The first snow in Exeter College this morning ❄️

#ExeterCollegeOxford #OxfordUniversity #Snowing

19.11.2024 11:15 β€” πŸ‘ 22    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1

@cemde is following 20 prominent accounts