Jan Dubiński's Avatar

Jan Dubiński

@jandubinski.bsky.social

PhD student in Machine Learning @Warsaw University of Technology and @IDEAS NCBR

275 Followers  |  864 Following  |  17 Posts  |  Joined: 22.11.2024  |  1.8594

Latest posts by jandubinski.bsky.social on Bluesky

Thanks for working on that together Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic!

@cvprconference.bsky.social

#AI #GenerativeAI #privacy

13.06.2025 19:12 — 👍 1    🔁 1    💬 0    📌 0
Post image

CDI confidently identifies training data with as low as 70 suspect samples!

Please check out the paper for more:
📜https://arxiv.org/abs/2411.12858

13.06.2025 19:12 — 👍 1    🔁 1    💬 1    📌 0
Post image

Instead, we propose CDI, a method that empowers data owners to check if their data was used to train a DM. CDI relies on selectively combining diverse membership signals from multiple samples an d statistical testing.

13.06.2025 19:12 — 👍 1    🔁 1    💬 1    📌 0
Post image

Unfortunately, state-of-the-art Membership Inference Attacks struggle to identify training data in large DMs - often performing close to random guessing (True Positive Rate = 1% at False Positive Rate = 1%), e.g. on DMs trained on ImageNet.

13.06.2025 19:12 — 👍 2    🔁 1    💬 1    📌 0
Post image

DMs benefit from large and diverse datasets for training - often sourced without the data owners' consent.

This raises a key question: was your data used? Membership Inference Attacks aim to find out by determining whether a specific data point was part of a model’s training set.

13.06.2025 19:12 — 👍 1    🔁 1    💬 1    📌 0

TL;DR: We show that Membership Inference Attacks (MIAs) struggle to detect training data in SOTA Diffusion Models (DMs) and instead propose the first dataset inference method to achieve this goal.

#AI #MachineLearning #GenerativeAI #Copyright

13.06.2025 19:12 — 👍 2    🔁 1    💬 1    📌 0
Post image

🚨We’re thrilled to present our paper “CDI: Copyrighted Data Identification in #DiffusionModels” at #CVPR2025 in Nashville! 🎸❗️

"Was this diffusion model trained on my dataset?"
Learn how to find out:
📍 Poster #276
🗓️ Saturday, June 14
🕒 3:00 – 5:00 PM PDT
📜https://arxiv.org/abs/2411.12858

13.06.2025 19:12 — 👍 4    🔁 1    💬 1    📌 0

Thanks for working on that together Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic!

@ideas-ncbr.bsky.social
#AI #GenerativeAI #privacy

05.02.2025 18:41 — 👍 3    🔁 0    💬 0    📌 0
Preview
Privacy Attacks on Image AutoRegressive Models Image autoregressive (IAR) models have surpassed diffusion models (DMs) in both image quality (FID: 1.48 vs. 1.58) and generation speed. However, their privacy risks remain largely unexplored. To addr...

If you’d like to learn more, check out our full arXiv paper, where we dive deeper into membership inference attacks, dataset inference, and memorization risks in IARs.

👉 Read the full paper: Privacy Attacks on Image AutoRegressive Models arxiv.org/abs/2502.02514

🧵 6/

05.02.2025 18:41 — 👍 4    🔁 0    💬 1    📌 0

ARs push image generation forward, but at a cost—higher privacy risks.

🛟 Can we make IARs safer?

✳️ We find Masked AutoRegressive models (MAR) inherently more private, likely because they incorporate diffusion-based techniques.

🧵 5/

05.02.2025 18:41 — 👍 1    🔁 0    💬 1    📌 0
Post image

⚠️ That's not all!

Large IARs memorize and regurgitate data at an alarming rate, making them vulnerable to copyright infringement, privacy violations, and dataset exposure.

🖼️ Our data extraction attack recovered up to 698 training images from the largest VAR model.

🧵 4/

05.02.2025 18:41 — 👍 1    🔁 0    💬 1    📌 0
Post image

⚠️ How serious is it?

🔍 Our findings are striking: attacks for identifying training samples are orders of magnitude more effective on IARs than DMs.

🧵 3/

05.02.2025 18:41 — 👍 1    🔁 0    💬 1    📌 0
Post image

IARs deliver higher quality, faster generation, and better scalability than #DiffusionModels (DMs), using techniques similar to Large Language Models like #GPT .

💡 Impressive? Absolutely. Safe? Not so much.

We find that IARs are highly vulnerable to privacy attacks.

🧵 2/

05.02.2025 18:41 — 👍 1    🔁 0    💬 1    📌 0
Post image

🚨 Image AutoRegressive Models Leak More Training Data Than Diffusion Models🚨

IARs — like the #NeurIPS2024 Best Paper — now lead in AI image generation. But at what risk?

IARs:
🔍 Are more likely than DMs to reveal training data
🖼️ Leak entire training images verbatim

🧵 1/

05.02.2025 18:41 — 👍 13    🔁 3    💬 1    📌 0

🙌 I am glad to be a part of this research with Youcef Djenouri, Nassim Belmecheri, Tomasz Michalak, Ahmed Nabil Belbachir, and Anis Yazidi!

20.12.2024 15:31 — 👍 1    🔁 0    💬 0    📌 0

📜 LGR-AD enables multiple diffusion model agents 🤖 to collaborate through a graph network, significantly enhancing quality and flexibility in text-to-image generation 🖼️.

20.12.2024 15:30 — 👍 1    🔁 0    💬 1    📌 0
Post image

😊 Happy to Share!

🎉 Our paper "Learning Graph Representation of Agent Diffusers (LGR-AD)" has been accepted as a full paper at #AAMAS (A*) International Conference on Autonomous Agents and Multiagent Systems!

#diffusion #graphs #agentsystem
@ideas-ncbr.bsky.social #WarszawUniversityOfTechnology

20.12.2024 15:30 — 👍 3    🔁 2    💬 1    📌 0

@jandubinski is following 20 prominent accounts