Thanks for working on that together Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic!
@cvprconference.bsky.social
#AI #GenerativeAI #privacy
@jandubinski.bsky.social
PhD student in Machine Learning @Warsaw University of Technology and @IDEAS NCBR
Thanks for working on that together Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic!
@cvprconference.bsky.social
#AI #GenerativeAI #privacy
CDI confidently identifies training data with as low as 70 suspect samples!
Please check out the paper for more:
📜https://arxiv.org/abs/2411.12858
Instead, we propose CDI, a method that empowers data owners to check if their data was used to train a DM. CDI relies on selectively combining diverse membership signals from multiple samples an d statistical testing.
13.06.2025 19:12 — 👍 1 🔁 1 💬 1 📌 0Unfortunately, state-of-the-art Membership Inference Attacks struggle to identify training data in large DMs - often performing close to random guessing (True Positive Rate = 1% at False Positive Rate = 1%), e.g. on DMs trained on ImageNet.
13.06.2025 19:12 — 👍 2 🔁 1 💬 1 📌 0DMs benefit from large and diverse datasets for training - often sourced without the data owners' consent.
This raises a key question: was your data used? Membership Inference Attacks aim to find out by determining whether a specific data point was part of a model’s training set.
TL;DR: We show that Membership Inference Attacks (MIAs) struggle to detect training data in SOTA Diffusion Models (DMs) and instead propose the first dataset inference method to achieve this goal.
#AI #MachineLearning #GenerativeAI #Copyright
🚨We’re thrilled to present our paper “CDI: Copyrighted Data Identification in #DiffusionModels” at #CVPR2025 in Nashville! 🎸❗️
"Was this diffusion model trained on my dataset?"
Learn how to find out:
📍 Poster #276
🗓️ Saturday, June 14
🕒 3:00 – 5:00 PM PDT
📜https://arxiv.org/abs/2411.12858
Thanks for working on that together Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic!
@ideas-ncbr.bsky.social
#AI #GenerativeAI #privacy
If you’d like to learn more, check out our full arXiv paper, where we dive deeper into membership inference attacks, dataset inference, and memorization risks in IARs.
👉 Read the full paper: Privacy Attacks on Image AutoRegressive Models arxiv.org/abs/2502.02514
🧵 6/
ARs push image generation forward, but at a cost—higher privacy risks.
🛟 Can we make IARs safer?
✳️ We find Masked AutoRegressive models (MAR) inherently more private, likely because they incorporate diffusion-based techniques.
🧵 5/
⚠️ That's not all!
Large IARs memorize and regurgitate data at an alarming rate, making them vulnerable to copyright infringement, privacy violations, and dataset exposure.
🖼️ Our data extraction attack recovered up to 698 training images from the largest VAR model.
🧵 4/
⚠️ How serious is it?
🔍 Our findings are striking: attacks for identifying training samples are orders of magnitude more effective on IARs than DMs.
🧵 3/
IARs deliver higher quality, faster generation, and better scalability than #DiffusionModels (DMs), using techniques similar to Large Language Models like #GPT .
💡 Impressive? Absolutely. Safe? Not so much.
We find that IARs are highly vulnerable to privacy attacks.
🧵 2/
🚨 Image AutoRegressive Models Leak More Training Data Than Diffusion Models🚨
IARs — like the #NeurIPS2024 Best Paper — now lead in AI image generation. But at what risk?
IARs:
🔍 Are more likely than DMs to reveal training data
🖼️ Leak entire training images verbatim
🧵 1/
🙌 I am glad to be a part of this research with Youcef Djenouri, Nassim Belmecheri, Tomasz Michalak, Ahmed Nabil Belbachir, and Anis Yazidi!
20.12.2024 15:31 — 👍 1 🔁 0 💬 0 📌 0📜 LGR-AD enables multiple diffusion model agents 🤖 to collaborate through a graph network, significantly enhancing quality and flexibility in text-to-image generation 🖼️.
20.12.2024 15:30 — 👍 1 🔁 0 💬 1 📌 0😊 Happy to Share!
🎉 Our paper "Learning Graph Representation of Agent Diffusers (LGR-AD)" has been accepted as a full paper at #AAMAS (A*) International Conference on Autonomous Agents and Multiagent Systems!
#diffusion #graphs #agentsystem
@ideas-ncbr.bsky.social #WarszawUniversityOfTechnology