Igor Shilov (โžก๏ธ ICML ๐Ÿ‡จ๐Ÿ‡ฆ)'s Avatar

Igor Shilov (โžก๏ธ ICML ๐Ÿ‡จ๐Ÿ‡ฆ)

@igorshilov.bsky.social

Anthropic AI Safety Fellow PhD student at Imperial College London. ML, interpretability, privacy, and stuff ๐Ÿณ๏ธโ€๐ŸŒˆ https://igorshilov.com/

59 Followers  |  143 Following  |  22 Posts  |  Joined: 21.07.2023  |  1.8541

Latest posts by igorshilov.bsky.social on Bluesky

Post image

Arrived in beautiful Vancouver!
More conferences with mountain views please!

Ping me if you want to chat about privacy and security of LLMs!

16.07.2025 13:43 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Loss Traces: Free Privacy Risk Evaluation Estimate the vulnerability of training samples to membership inference attacks by analyzing their loss traces during model training - no shadow models required!

Check out our website for more info: computationalprivacy.github.io/loss_traces/

arxiv: arxiv.org/abs/2411.05743

See you in Seattle!

And thanks to my amazing co-authors: Joseph Pollock, Euodia Dodd and @yvesalexandre.bsky.social

24.06.2025 15:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Why this matters:

โœ… Enables iterative privacy risk assessment during model development
โœ… Zero additional computational cost
โœ… Could inform targeted defenses (selective unlearning, data removal)
โœ… Practical for large models where shadow model approaches fail

24.06.2025 15:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
# Standard PyTorch training loop
criterion = nn.CrossEntropyLoss(reduction="none")  # Change from default "mean"

# During training
loss = criterion(outputs, targets)
# Here loss has shape [batch_size] - per-sample losses

# Save the per-sample losses
saved_losses.append(loss.detach())

# Take mean for backward pass
loss.mean().backward()

# Standard PyTorch training loop criterion = nn.CrossEntropyLoss(reduction="none") # Change from default "mean" # During training loss = criterion(outputs, targets) # Here loss has shape [batch_size] - per-sample losses # Save the per-sample losses saved_losses.append(loss.detach()) # Take mean for backward pass loss.mean().backward()

The best part? You can collect per-sample losses for free during training by simply changing the loss reduction:

24.06.2025 15:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
This image shows a Receiver Operating Characteristic (ROC) curve plotting Precision@k=1% against False Positive Rate (FPR) on a logarithmic x-axis scale from 10^-5 to 10^0. The graph compares five different methods:

LT-IQR (Ours) - shown as a solid blue line that achieves high precision (around 0.67) at very low FPR and gradually increases to nearly 1.0
Loss - shown as a solid green line that closely follows the LT-IQR performance
Gradient norm - shown as a solid pink/magenta line that also tracks very similarly to the LT-IQR method
RMIA (2 shadow models) - shown as an orange dashed line that starts at around 0.61 precision and increases more gradually
Random guess - shown as a dotted light blue line that increases linearly from 0 to 1.0, representing baseline random performance

All methods except random guessing show strong performance, with the proposed LT-IQR method and loss/gradient norm approaches achieving superior precision at low false positive rates. The curves demonstrate the trade-off between precision and false positive rate for what appears to be a machine learning model evaluation or anomaly detection task.

This image shows a Receiver Operating Characteristic (ROC) curve plotting Precision@k=1% against False Positive Rate (FPR) on a logarithmic x-axis scale from 10^-5 to 10^0. The graph compares five different methods: LT-IQR (Ours) - shown as a solid blue line that achieves high precision (around 0.67) at very low FPR and gradually increases to nearly 1.0 Loss - shown as a solid green line that closely follows the LT-IQR performance Gradient norm - shown as a solid pink/magenta line that also tracks very similarly to the LT-IQR method RMIA (2 shadow models) - shown as an orange dashed line that starts at around 0.61 precision and increases more gradually Random guess - shown as a dotted light blue line that increases linearly from 0 to 1.0, representing baseline random performance All methods except random guessing show strong performance, with the proposed LT-IQR method and loss/gradient norm approaches achieving superior precision at low false positive rates. The curves demonstrate the trade-off between precision and false positive rate for what appears to be a machine learning model evaluation or anomaly detection task.

Our proposed loss trace aggregation methods achieve 92% Precision@k=1% in identifying samples vulnerable to LiRA attack on CIFAR-10 (positives at FPR=0.001). Prior computationally effective vulnerability detection methods (loss, gradient norm) perform barely better than random on the same task.

24.06.2025 15:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
This image shows a training loss curve graph with three colored lines (purple, yellow-green, and cyan) plotting loss values against training epochs from 0 to 100. Above the graph are three small images that appear to correspond to the three different training runs: a golden/yellow frog or toad, a white tree frog, and what appears to be a darker colored amphibian. The purple line shows rapid loss decrease early in training and stays near zero throughout. The yellow-green line shows high volatility with peaks reaching around 7.0 and periods of low loss. The cyan line demonstrates the most erratic behavior with frequent spikes up to 7.5 and sustained periods of high loss, particularly in the 60-80 epoch range. All three lines converge to low loss values by epoch 100, suggesting successful model training despite different convergence patterns.

This image shows a training loss curve graph with three colored lines (purple, yellow-green, and cyan) plotting loss values against training epochs from 0 to 100. Above the graph are three small images that appear to correspond to the three different training runs: a golden/yellow frog or toad, a white tree frog, and what appears to be a darker colored amphibian. The purple line shows rapid loss decrease early in training and stays near zero throughout. The yellow-green line shows high volatility with peaks reaching around 7.0 and periods of low loss. The cyan line demonstrates the most erratic behavior with frequent spikes up to 7.5 and sustained periods of high loss, particularly in the 60-80 epoch range. All three lines converge to low loss values by epoch 100, suggesting successful model training despite different convergence patterns.

A line chart showing training loss over 100 epochs for three different data conditions. The chart has three colored lines: teal for hard-to-fit outliers (12% vulnerable), olive/yellow-green for easy-to-fit outliers (29% vulnerable), and purple for average data (4.6% vulnerable). All three lines start around 2.5 loss at epoch 0, with the purple average line declining most rapidly and smoothly to near 0 by epoch 100. The olive easy-to-fit outlier line declines more gradually, reaching about 0.1 by epoch 100. The teal hard-to-fit outlier line shows the most volatile behavior with frequent spikes and only reaches about 1.25 loss by epoch 100. A vertical dashed gray line appears around epoch 10, likely marking a significant training milestone.

A line chart showing training loss over 100 epochs for three different data conditions. The chart has three colored lines: teal for hard-to-fit outliers (12% vulnerable), olive/yellow-green for easy-to-fit outliers (29% vulnerable), and purple for average data (4.6% vulnerable). All three lines start around 2.5 loss at epoch 0, with the purple average line declining most rapidly and smoothly to near 0 by epoch 100. The olive easy-to-fit outlier line declines more gradually, reaching about 0.1 by epoch 100. The teal hard-to-fit outlier line shows the most volatile behavior with frequent spikes and only reaches about 1.25 loss by epoch 100. A vertical dashed gray line appears around epoch 10, likely marking a significant training milestone.

๐Ÿธ Check out these CIFAR-10 frog examples:

Easy-to-fit outliers: Loss drops late but reaches near zero โ†’ most vulnerable

Hard-to-fit outliers: Loss drops slowly, stays relatively high โ†’ somewhat vulnerable

Average samples: Loss drops quickly and stays low โ†’ least vulnerable

24.06.2025 15:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Problem: SoTA MIAs often require training hundreds of shadow models to identify vulnerable samples. This is extremely expensive, especially for large models.

Solution: Loss pattern throughout training tells you a lot about individual's vulnerability.

โฌ‡๏ธ

24.06.2025 15:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Loss Traces: Free Privacy Risk Evaluation Estimate the vulnerability of training samples to membership inference attacks by analyzing their loss traces during model training - no shadow models required!

New paper accepted @ USENIX Security 2025!

We show how to identify training samples most vulnerable to membership inference attacks - FOR FREE, using artifacts naturally available during training! No shadow models needed.

Learn more: computationalprivacy.github.io/loss_traces/

Thread below ๐Ÿงต

24.06.2025 15:17 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a resu...

Check out our new pre-print "Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models", joint work with fantastic colleagues from Google (DeepMind) and many other great institutions! Find it here: arxiv.org/abs/2505.18773

27.05.2025 08:00 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Mom I'm on TV

09.04.2025 08:48 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Privacy in Machine Learning Meetup @ Imperial The Computational Privacy Group at Imperial College London is organizing the first Machine Learning Privacy meetup, recognizing the growing community of researchers in and around London working at the...

We're very excited to host this meetup and we'd be thrilled to see you there!

imperial.ac.uk/events/18318...

17.12.2024 10:26 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
https://docs.google.com/forms/d/e/1FAIpQLScg20yOOKp9Ilug5lxumCb4s0MvoiEyibCcfRZ6qa6mLNsHeg/viewform

We're also inviting PhD students to give 1-minute lightning talks to share their research with the community.

If you're interested, please sign up here:

docs.google.com/forms/d/e/1F...

17.12.2024 10:26 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

The line-up for the evening:

- Graham Cormode (University of Warwick/Meta AI)
- Lukas Wutschitz (M365 Research, Microsoft)
- Jamie Hayes (Google DeepMind)
- Ilia Shumailov (Google DeepMind)

17.12.2024 10:26 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Recognizing the growing ML Privacy community in and around London, we hope this to be a great opportunity for people to connect and share perspectives.

We will be hosting research talks from our amazing invited speakers, followed by a happy hour.

17.12.2024 10:26 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Privacy in Machine Learning Meetup @ Imperial The Computational Privacy Group at Imperial College London is organizing the first Machine Learning Privacy meetup, recognizing the growing community of researchers in and around London working at the...

๐Ÿ“ข Privacy in ML Meetup @ Imperial is back!

๐Ÿ“… February 4th, 6pm, Imperial College London

We are happy to announce the new date for the first Privacy in ML Meetup @ Imperial, bringing together researchers from across academia and industry.

RSVP: www.imperial.ac.uk/events/18318...

17.12.2024 10:26 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

So easy to be cynical and glib. No-one is under any illusions about the way events often play out in the Middle East, or the character of the groups who defeated Assad. But it is ok to hope. It is ok to celebrate the end of a dictator.

08.12.2024 11:49 โ€” ๐Ÿ‘ 1354    ๐Ÿ” 183    ๐Ÿ’ฌ 41    ๐Ÿ“Œ 4

Sophon was such an overkill for halting Earth's research. All Trisolarans needed to do was to destroy Overleaf - with the exact same outcome.

03.12.2024 15:30 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image

Wow so we actually got to a point where Anthropic sponsors exhibitions at Tate modern

20.11.2024 11:06 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Surprising fact of the day: Gordon Brown is a Scot

18.01.2024 19:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Low-stakes conspiracy of the day: the protestor throwing glitter at Starmer was a personal favour from Starmer himself. Because as a serious politician you donโ€™t get to wear glitter in public anymore, and sometimes nothing hits quite like it

13.10.2023 09:28 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

ะกั‡ะธั‚ะฐัŽ, ั‡ั‚ะพ ะผะพะถะฝะพ ะทะฐะฑั€ะฐั‚ัŒ ะพะฑั€ะฐั‚ะฝะพ ัะปะพะฒะพ ั‚ะฒะธั‚ั‹, ะฟะพั‚ะพะผัƒ ั‡ั‚ะพ ะฒ ะฅ ั‚ะตะฟะตั€ัŒ ยซะฟะพัั‚ั‹ยป

22.09.2023 06:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

ะŸะพั‡ะตะผัƒ ััƒั‰ะตัั‚ะฒัƒะตั‚ prediction market ะฝะฐ ะทะฐะดะตั€ะถะบะธ ั€ะตะนัะพะฒ ะธ ะฟะพั‡ะตะผัƒ ัƒ ะฝะตะณะพ ะตัั‚ัŒ ะดะตะฝัŒะณะธ ะฝะฐ ั€ะตะบะปะฐะผัƒ ะฒ ั‚ะฒะธั‚ั‚ะตั€ะต

app.wingman.wtf/hot-flights

20.09.2023 18:46 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

ะงั‚ะพ ะบะฐะบ ะทะฐ ั‡ั‚ะพ

ะŸะพั‡ะตะผัƒ ั‚ั‹ ะฒะพะพะฑั‰ะต ั‚ัƒะดะฐ ะฟะธัะฐะป

20.09.2023 18:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

ะœะฝะต ะพั‚ ัั‚ะพะณะพ ั‚ั€ะตะดะฐ ะทะฐั…ะพั‚ะตะปะพััŒ ะบ ะทะธะผะฝะตะน ัะตััะธะธ ะฟะพะดะณะพั‚ะพะฒะธั‚ัŒัั

20.09.2023 18:20 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@igorshilov is following 20 prominent accounts