Arrived in beautiful Vancouver!
More conferences with mountain views please!
Ping me if you want to chat about privacy and security of LLMs!
@igorshilov.bsky.social
Anthropic AI Safety Fellow PhD student at Imperial College London. ML, interpretability, privacy, and stuff ๐ณ๏ธโ๐ https://igorshilov.com/
Arrived in beautiful Vancouver!
More conferences with mountain views please!
Ping me if you want to chat about privacy and security of LLMs!
Check out our website for more info: computationalprivacy.github.io/loss_traces/
arxiv: arxiv.org/abs/2411.05743
See you in Seattle!
And thanks to my amazing co-authors: Joseph Pollock, Euodia Dodd and @yvesalexandre.bsky.social
Why this matters:
โ
Enables iterative privacy risk assessment during model development
โ
Zero additional computational cost
โ
Could inform targeted defenses (selective unlearning, data removal)
โ
Practical for large models where shadow model approaches fail
# Standard PyTorch training loop criterion = nn.CrossEntropyLoss(reduction="none") # Change from default "mean" # During training loss = criterion(outputs, targets) # Here loss has shape [batch_size] - per-sample losses # Save the per-sample losses saved_losses.append(loss.detach()) # Take mean for backward pass loss.mean().backward()
The best part? You can collect per-sample losses for free during training by simply changing the loss reduction:
24.06.2025 15:17 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0This image shows a Receiver Operating Characteristic (ROC) curve plotting Precision@k=1% against False Positive Rate (FPR) on a logarithmic x-axis scale from 10^-5 to 10^0. The graph compares five different methods: LT-IQR (Ours) - shown as a solid blue line that achieves high precision (around 0.67) at very low FPR and gradually increases to nearly 1.0 Loss - shown as a solid green line that closely follows the LT-IQR performance Gradient norm - shown as a solid pink/magenta line that also tracks very similarly to the LT-IQR method RMIA (2 shadow models) - shown as an orange dashed line that starts at around 0.61 precision and increases more gradually Random guess - shown as a dotted light blue line that increases linearly from 0 to 1.0, representing baseline random performance All methods except random guessing show strong performance, with the proposed LT-IQR method and loss/gradient norm approaches achieving superior precision at low false positive rates. The curves demonstrate the trade-off between precision and false positive rate for what appears to be a machine learning model evaluation or anomaly detection task.
Our proposed loss trace aggregation methods achieve 92% Precision@k=1% in identifying samples vulnerable to LiRA attack on CIFAR-10 (positives at FPR=0.001). Prior computationally effective vulnerability detection methods (loss, gradient norm) perform barely better than random on the same task.
24.06.2025 15:17 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0This image shows a training loss curve graph with three colored lines (purple, yellow-green, and cyan) plotting loss values against training epochs from 0 to 100. Above the graph are three small images that appear to correspond to the three different training runs: a golden/yellow frog or toad, a white tree frog, and what appears to be a darker colored amphibian. The purple line shows rapid loss decrease early in training and stays near zero throughout. The yellow-green line shows high volatility with peaks reaching around 7.0 and periods of low loss. The cyan line demonstrates the most erratic behavior with frequent spikes up to 7.5 and sustained periods of high loss, particularly in the 60-80 epoch range. All three lines converge to low loss values by epoch 100, suggesting successful model training despite different convergence patterns.
A line chart showing training loss over 100 epochs for three different data conditions. The chart has three colored lines: teal for hard-to-fit outliers (12% vulnerable), olive/yellow-green for easy-to-fit outliers (29% vulnerable), and purple for average data (4.6% vulnerable). All three lines start around 2.5 loss at epoch 0, with the purple average line declining most rapidly and smoothly to near 0 by epoch 100. The olive easy-to-fit outlier line declines more gradually, reaching about 0.1 by epoch 100. The teal hard-to-fit outlier line shows the most volatile behavior with frequent spikes and only reaches about 1.25 loss by epoch 100. A vertical dashed gray line appears around epoch 10, likely marking a significant training milestone.
๐ธ Check out these CIFAR-10 frog examples:
Easy-to-fit outliers: Loss drops late but reaches near zero โ most vulnerable
Hard-to-fit outliers: Loss drops slowly, stays relatively high โ somewhat vulnerable
Average samples: Loss drops quickly and stays low โ least vulnerable
Problem: SoTA MIAs often require training hundreds of shadow models to identify vulnerable samples. This is extremely expensive, especially for large models.
Solution: Loss pattern throughout training tells you a lot about individual's vulnerability.
โฌ๏ธ
New paper accepted @ USENIX Security 2025!
We show how to identify training samples most vulnerable to membership inference attacks - FOR FREE, using artifacts naturally available during training! No shadow models needed.
Learn more: computationalprivacy.github.io/loss_traces/
Thread below ๐งต
Check out our new pre-print "Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models", joint work with fantastic colleagues from Google (DeepMind) and many other great institutions! Find it here: arxiv.org/abs/2505.18773
27.05.2025 08:00 โ ๐ 4 ๐ 2 ๐ฌ 0 ๐ 0Mom I'm on TV
09.04.2025 08:48 โ ๐ 7 ๐ 1 ๐ฌ 1 ๐ 0We're very excited to host this meetup and we'd be thrilled to see you there!
imperial.ac.uk/events/18318...
We're also inviting PhD students to give 1-minute lightning talks to share their research with the community.
If you're interested, please sign up here:
docs.google.com/forms/d/e/1F...
The line-up for the evening:
- Graham Cormode (University of Warwick/Meta AI)
- Lukas Wutschitz (M365 Research, Microsoft)
- Jamie Hayes (Google DeepMind)
- Ilia Shumailov (Google DeepMind)
Recognizing the growing ML Privacy community in and around London, we hope this to be a great opportunity for people to connect and share perspectives.
We will be hosting research talks from our amazing invited speakers, followed by a happy hour.
๐ข Privacy in ML Meetup @ Imperial is back!
๐
February 4th, 6pm, Imperial College London
We are happy to announce the new date for the first Privacy in ML Meetup @ Imperial, bringing together researchers from across academia and industry.
RSVP: www.imperial.ac.uk/events/18318...
So easy to be cynical and glib. No-one is under any illusions about the way events often play out in the Middle East, or the character of the groups who defeated Assad. But it is ok to hope. It is ok to celebrate the end of a dictator.
08.12.2024 11:49 โ ๐ 1354 ๐ 183 ๐ฌ 41 ๐ 4Sophon was such an overkill for halting Earth's research. All Trisolarans needed to do was to destroy Overleaf - with the exact same outcome.
03.12.2024 15:30 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0Wow so we actually got to a point where Anthropic sponsors exhibitions at Tate modern
20.11.2024 11:06 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Surprising fact of the day: Gordon Brown is a Scot
18.01.2024 19:34 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0Low-stakes conspiracy of the day: the protestor throwing glitter at Starmer was a personal favour from Starmer himself. Because as a serious politician you donโt get to wear glitter in public anymore, and sometimes nothing hits quite like it
13.10.2023 09:28 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0ะกัะธัะฐั, ััะพ ะผะพะถะฝะพ ะทะฐะฑัะฐัั ะพะฑัะฐัะฝะพ ัะปะพะฒะพ ัะฒะธัั, ะฟะพัะพะผั ััะพ ะฒ ะฅ ัะตะฟะตัั ยซะฟะพัััยป
22.09.2023 06:34 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0ะะพัะตะผั ัััะตััะฒัะตั prediction market ะฝะฐ ะทะฐะดะตัะถะบะธ ัะตะนัะพะฒ ะธ ะฟะพัะตะผั ั ะฝะตะณะพ ะตััั ะดะตะฝัะณะธ ะฝะฐ ัะตะบะปะฐะผั ะฒ ัะฒะธััะตัะต
app.wingman.wtf/hot-flights
ะงัะพ ะบะฐะบ ะทะฐ ััะพ
ะะพัะตะผั ัั ะฒะพะพะฑัะต ััะดะฐ ะฟะธัะฐะป
ะะฝะต ะพั ััะพะณะพ ััะตะดะฐ ะทะฐั ะพัะตะปะพัั ะบ ะทะธะผะฝะตะน ัะตััะธะธ ะฟะพะดะณะพัะพะฒะธัััั
20.09.2023 18:20 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0