Differential Privacy Papers @dppapers

DP-RFT: Learning to Generate Synthetic Text via Differentially Private Reinforcement Fine-Tuning Fangyuan Xu, Sihao Chen, Zinan Lin, Taiwei Shi, Sydney Graham, Pei Zhou, Mengting Wan, Alex Stein, Virginia Estellers, Charles Chen, Morris Sharp, Richard Speyer, Tadas Baltrusaitis, Jennifer Neville, Eunsol Choi, Longqi Yang http://arxiv.org/abs/2602.18633 Differentially private (DP) synthetic data generation plays a pivotal role in developing large language models (LLMs) on private data, where data owners cannot provide eyes-on access to individual examples. Generating DP synthetic data typically involves a difficult trade-off. On one hand, DP finetuning methods train an LLM as a synthetic data generator with formal privacy guarantees, yet it still requires the raw content of private examples for model training. However, methods that avoid direct exposure to private data are bounded by an off-the-shelf, un-finetuned model, whose outputs often lack domain fidelity. Can we train an LLM to generate high-quality synthetic text without eyes-on access to individual private examples? In this work, we introduce Differentially Private Reinforcement Fine-Tuning (DP-RFT), an online reinforcement learning algorithm for synthetic data generation with LLMs. DP-RFT leverages DP-protected nearest-neighbor votes from an eyes-off private corpus as a reward signal for on-policy synthetic samples generated by an LLM. The LLM iteratively learns to generate synthetic data to maximize the expected DP votes through Proximal Policy Optimization (PPO). We evaluate DP-RFT for long-form and domain-specific synthetic data generation, such as news articles, meeting transcripts, and medical article abstracts. Our experiments show that DP-RFT closes the gap between private evolution and DP finetuning methods in terms of the fidelity and downstream utility of the generated synthetic data, while respecting the private data boundary.

24.02.2026 04:54 — 👍 0 🔁 0 💬 0 📌 0

$Statistical Imaginaries, State Legitimacy: Grappling with the Arrangements Underpinning Quantification in the U.S. Census Jayshree Sarathy, danah boyd http://arxiv.org/abs/2602.18636 Over the last century, the adoption of novel scientific methods for conducting the U.S. census has been met with wide-ranging receptions. Some methods were quietly embraced, while others sparked decades-long controversies. What accounts for these differences? We argue that controversies emerge from $\textit{arrangements of statistical imaginaries}$, putting into tension divergent visions of the census. To analyze these dynamics, we compare reactions to two methods designed to improve data accuracy (imputation and adjustment) and two methods designed to protect confidentiality (swapping and differential privacy), offering insight into how each method reconfigures stakeholder orientations and rhetorical claims. These cases allow us to reflect on how technocratic efforts to improve accuracy and confidentiality can strengthen -- or erode -- trust in data. Our analysis shows how the credibility of the Census Bureau and its data stem not just from empirical evaluations of quantification, but also from how statistical imaginaries are contested and stabilized.$

Statistical Imaginaries, State Legitimacy: Grappling with the Arrangements Underpinning Quantification in the U.S. Census Jayshree Sarathy, danah boyd http://arxiv.org/abs/2602.18636 Over the last century, the adoption of novel scientific methods for conducting the U.S. census has been met with wide-ranging receptions. Some methods were quietly embraced, while others sparked decades-long controversies. What accounts for these differences? We argue that controversies emerge from $\textit{arrangements of statistical imaginaries}$, putting into tension divergent visions of the census. To analyze these dynamics, we compare reactions to two methods designed to improve data accuracy (imputation and adjustment) and two methods designed to protect confidentiality (swapping and differential privacy), offering insight into how each method reconfigures stakeholder orientations and rhetorical claims. These cases allow us to reflect on how technocratic efforts to improve accuracy and confidentiality can strengthen -- or erode -- trust in data. Our analysis shows how the credibility of the Census Bureau and its data stem not just from empirical evaluations of quantification, but also from how statistical imaginaries are contested and stabilized.

Statistical Imaginaries, State Legitimacy: Grappling with the Arrangements Underpinning Quantification in the U.S. Census

Jayshree Sarathy, danah boyd

http://arxiv.org/abs/2602.18636

24.02.2026 04:54 — 👍 0 🔁 0 💬 0 📌 0

Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy Danah Boyd, Jayshree Sarathy http://arxiv.org/abs/2602.18648 When the U.S. Census Bureau announced its intention to modernize its disclosure avoidance procedures for the 2020 Census, it sparked a controversy that is still underway. The move to differential privacy introduced technical and procedural uncertainties, leaving stakeholders unable to evaluate the quality of the data. More importantly, this transformation exposed the statistical illusions and limitations of census data, weakening stakeholders' trust in the data and in the Census Bureau itself. This essay examines the epistemic currents of this controversy. Drawing on theories from Science and Technology Studies (STS) and ethnographic fieldwork, we analyze the current controversy over differential privacy as a battle over uncertainty, trust, and legitimacy of the Census. We argue that rebuilding trust will require more than technical repairs or improved communication; it will require reconstructing what we identify as a 'statistical imaginary.'

Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy

Danah Boyd, Jayshree Sarathy

http://arxiv.org/abs/2602.18648

24.02.2026 04:54 — 👍 0 🔁 0 💬 0 📌 0

$SLDP: Semi-Local Differential Privacy for Density-Adaptive Analytics Alexey Kroshnin, Alexandra Suvorikova http://arxiv.org/abs/2602.18910 Density-adaptive domain discretization is essential for high-utility privacy-preserving analytics but remains challenging under Local Differential Privacy (LDP) due to the privacy-budget costs associated with iterative refinement. We propose a novel framework, Semi-Local Differential Privacy (SLDP), that assigns a privacy region to each user based on local density and defines adjacency by the potential movement of a point within its privacy region. We present an interactive $(\varepsilon, δ)$-SLDP protocol, orchestrated by an honest-but-curious server over a public channel, to estimate these regions privately. Crucially, our framework decouples the privacy cost from the number of refinement iterations, allowing for high-resolution grids without additional privacy budget cost. We experimentally demonstrate the framework's effectiveness on estimation tasks across synthetic and real-world datasets.$

SLDP: Semi-Local Differential Privacy for Density-Adaptive Analytics Alexey Kroshnin, Alexandra Suvorikova http://arxiv.org/abs/2602.18910 Density-adaptive domain discretization is essential for high-utility privacy-preserving analytics but remains challenging under Local Differential Privacy (LDP) due to the privacy-budget costs associated with iterative refinement. We propose a novel framework, Semi-Local Differential Privacy (SLDP), that assigns a privacy region to each user based on local density and defines adjacency by the potential movement of a point within its privacy region. We present an interactive $(\varepsilon, δ)$-SLDP protocol, orchestrated by an honest-but-curious server over a public channel, to estimate these regions privately. Crucially, our framework decouples the privacy cost from the number of refinement iterations, allowing for high-resolution grids without additional privacy budget cost. We experimentally demonstrate the framework's effectiveness on estimation tasks across synthetic and real-world datasets.

SLDP: Semi-Local Differential Privacy for Density-Adaptive Analytics

Alexey Kroshnin, Alexandra Suvorikova

http://arxiv.org/abs/2602.18910

24.02.2026 04:53 — 👍 0 🔁 0 💬 0 📌 0

$DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models Jin Liu, Yinbin Miao, Ning Xi, Junkang Liu http://arxiv.org/abs/2602.19945 Balancing convergence efficiency and robustness under Differential Privacy (DP) is a central challenge in Federated Learning (FL). While AdamW accelerates training and fine-tuning in large-scale models, we find that directly applying it to Differentially Private FL (DPFL) suffers from three major issues: (i) data heterogeneity and privacy noise jointly amplify the variance of second-moment estimator, (ii) DP perturbations bias the second-moment estimator, and (iii) DP amplify AdamW sensitivity to local overfitting, worsening client drift. We propose DP-FedAdamW, the first AdamW-based optimizer for DPFL. It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift. Theoretically, we establish an unbiased second-moment estimator and prove a linearly accelerated convergence rate without any heterogeneity assumption, while providing tighter $(\varepsilon,δ)$-DP guarantees. Our empirical results demonstrate the effectiveness of DP-FedAdamW across language and vision Transformers and ResNet-18. On Tiny-ImageNet (Swin-Base, $\varepsilon=1$), DP-FedAdamW outperforms the state-of-the-art (SOTA) by 5.83\%. The code is available in Appendix.$

DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models Jin Liu, Yinbin Miao, Ning Xi, Junkang Liu http://arxiv.org/abs/2602.19945 Balancing convergence efficiency and robustness under Differential Privacy (DP) is a central challenge in Federated Learning (FL). While AdamW accelerates training and fine-tuning in large-scale models, we find that directly applying it to Differentially Private FL (DPFL) suffers from three major issues: (i) data heterogeneity and privacy noise jointly amplify the variance of second-moment estimator, (ii) DP perturbations bias the second-moment estimator, and (iii) DP amplify AdamW sensitivity to local overfitting, worsening client drift. We propose DP-FedAdamW, the first AdamW-based optimizer for DPFL. It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift. Theoretically, we establish an unbiased second-moment estimator and prove a linearly accelerated convergence rate without any heterogeneity assumption, while providing tighter $(\varepsilon,δ)$-DP guarantees. Our empirical results demonstrate the effectiveness of DP-FedAdamW across language and vision Transformers and ResNet-18. On Tiny-ImageNet (Swin-Base, $\varepsilon=1$), DP-FedAdamW outperforms the state-of-the-art (SOTA) by 5.83\%. The code is available in Appendix.

DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models

Jin Liu, Yinbin Miao, Ning Xi, Junkang Liu

http://arxiv.org/abs/2602.19945

24.02.2026 04:53 — 👍 0 🔁 0 💬 0 📌 0

JAX-Privacy: A library for differentially private machine learning Ryan McKenna, Galen Andrew, Borja Balle, Vadym Doroshenko, Arun Ganesh, Weiwei Kong, Alex Kurakin, Brendan McMahan, Mikhail Pravilov http://arxiv.org/abs/2602.17861 JAX-Privacy is a library designed to simplify the deployment of robust and performant mechanisms for differentially private machine learning. Guided by design principles of usability, flexibility, and efficiency, JAX-Privacy serves both researchers requiring deep customization and practitioners who want a more out-of-the-box experience. The library provides verified, modular primitives for critical components for all aspects of the mechanism design including batch selection, gradient clipping, noise addition, accounting, and auditing, and brings together a large body of recent research on differentially private ML.

23.02.2026 04:53 — 👍 2 🔁 1 💬 0 📌 0

CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras Rong Fu, Wenxin Zhang, Yibo Meng, Jia Yee Tan, Jiaxuan Lu, Rui Lu, Jiekai Wu, Zhaolu Kang, Simon Fong http://arxiv.org/abs/2602.18047 City-scale person re-identification across distributed cameras must handle severe appearance changes from viewpoint, occlusion, and domain shift while complying with data protection rules that prevent sharing raw imagery. We introduce CityGuard, a topology-aware transformer for privacy-preserving identity retrieval in decentralized surveillance. The framework integrates three components. A dispersion-adaptive metric learner adjusts instance-level margins according to feature spread, increasing intra-class compactness. Spatially conditioned attention injects coarse geometry, such as GPS or deployment floor plans, into graph-based self-attention to enable projectively consistent cross-view alignment using only coarse geometric priors without requiring survey-grade calibration. Differentially private embedding maps are coupled with compact approximate indexes to support secure and cost-efficient deployment. Together these designs produce descriptors robust to viewpoint variation, occlusion, and domain shifts, and they enable a tunable balance between privacy and utility under rigorous differential-privacy accounting. Experiments on Market-1501 and additional public benchmarks, complemented by database-scale retrieval studies, show consistent gains in retrieval precision and query throughput over strong baselines, confirming the practicality of the framework for privacy-critical urban identity matching.

23.02.2026 04:53 — 👍 1 🔁 0 💬 0 📌 0

Efficient privacy loss accounting for subsampling and random allocation Vitaly Feldman, Moshe Shenfeld http://arxiv.org/abs/2602.17284 We consider the privacy amplification properties of a sampling scheme in which a user's data is used in $k$ steps chosen randomly and uniformly from a sequence (or set) of $t$ steps. This sampling scheme has been recently applied in the context of differentially private optimization (Chua et al., 2024a; Choquette-Choo et al., 2025) and communication-efficient high-dimensional private aggregation (Asi et al., 2025), where it was shown to have utility advantages over the standard Poisson sampling. Theoretical analyses of this sampling scheme (Feldman & Shenfeld, 2025; Dong et al., 2025) lead to bounds that are close to those of Poisson sampling, yet still have two significant shortcomings. First, in many practical settings, the resulting privacy parameters are not tight due to the approximation steps in the analysis. Second, the computed parameters are either the hockey stick or Renyi divergence, both of which introduce overheads when used in privacy loss accounting. In this work, we demonstrate that the privacy loss distribution (PLD) of random allocation applied to any differentially private algorithm can be computed efficiently. When applied to the Gaussian mechanism, our results demonstrate that the privacy-utility trade-off for random allocation is at least as good as that of Poisson subsampling. In particular, random allocation is better suited for training via DP-SGD. To support these computations, our work develops new tools for general privacy loss accounting based on a notion of PLD realization. This notion allows us to extend accurate privacy loss accounting to subsampling which previously required manual noise-mechanism-specific analysis.

Efficient privacy loss accounting for subsampling and random allocation

Vitaly Feldman, Moshe Shenfeld

http://arxiv.org/abs/2602.17284

20.02.2026 04:53 — 👍 0 🔁 0 💬 1 📌 0

Privacy in Theory, Bugs in Practice: Grey-Box Auditing of Differential Privacy Libraries Tudor Cebere, David Erb, Damien Desfontaines, Aurélien Bellet, Jack Fitzsimons http://arxiv.org/abs/2602.17454 Differential privacy (DP) implementations are notoriously prone to errors, with subtle bugs frequently invalidating theoretical guarantees. Existing verification methods are often impractical: formal tools are too restrictive, while black-box statistical auditing is intractable for complex pipelines and fails to pinpoint the source of the bug. This paper introduces Re:cord-play, a gray-box auditing paradigm that inspects the internal state of DP algorithms. By running an instrumented algorithm on neighboring datasets with identical randomness, Re:cord-play directly checks for data-dependent control flow and provides concrete falsification of sensitivity violations by comparing declared sensitivity against the empirically measured distance between internal inputs. We generalize this to Re:cord-play-sample, a full statistical audit that isolates and tests each component, including untrusted ones. We show that our novel testing approach is both effective and necessary by auditing 12 open-source libraries, including SmartNoise SDK, Opacus, and Diffprivlib, and uncovering 13 privacy violations that impact their theoretical guarantees. We release our framework as an open-source Python package, thereby making it easy for DP developers to integrate effective, computationally inexpensive, and seamless privacy testing as part of their software development lifecycle.

20.02.2026 04:53 — 👍 2 🔁 0 💬 0 📌 0

$Computational Hardness of Private Coreset Badih Ghazi, Cristóbal Guzmán, Pritish Kamath, Alexander Knop, Ravi Kumar, Pasin Manurangsi http://arxiv.org/abs/2602.17488 We study the problem of differentially private (DP) computation of coreset for the $k$-means objective. For a given input set of points, a coreset is another set of points such that the $k$-means objective for any candidate solution is preserved up to a multiplicative $(1 \pm α)$ factor (and some additive factor). We prove the first computational lower bounds for this problem. Specifically, assuming the existence of one-way functions, we show that no polynomial-time $(ε, 1/n^{ω(1)})$-DP algorithm can compute a coreset for $k$-means in the $\ell_\infty$-metric for some constant $α> 0$ (and some constant additive factor), even for $k=3$. For $k$-means in the Euclidean metric, we show a similar result but only for $α= Θ\left(1/d^2\right)$, where $d$ is the dimension.$

Computational Hardness of Private Coreset Badih Ghazi, Cristóbal Guzmán, Pritish Kamath, Alexander Knop, Ravi Kumar, Pasin Manurangsi http://arxiv.org/abs/2602.17488 We study the problem of differentially private (DP) computation of coreset for the $k$-means objective. For a given input set of points, a coreset is another set of points such that the $k$-means objective for any candidate solution is preserved up to a multiplicative $(1 \pm α)$ factor (and some additive factor). We prove the first computational lower bounds for this problem. Specifically, assuming the existence of one-way functions, we show that no polynomial-time $(ε, 1/n^{ω(1)})$-DP algorithm can compute a coreset for $k$-means in the $\ell_\infty$-metric for some constant $α> 0$ (and some constant additive factor), even for $k=3$. For $k$-means in the Euclidean metric, we show a similar result but only for $α= Θ\left(1/d^2\right)$, where $d$ is the dimension.

Computational Hardness of Private Coreset

Badih Ghazi, Cristóbal Guzmán, Pritish Kamath, Alexander Knop, Ravi Kumar, Pasin Manurangsi

http://arxiv.org/abs/2602.17488

20.02.2026 04:53 — 👍 0 🔁 0 💬 0 📌 0

$Differentially Private Non-convex Distributionally Robust Optimization Difei Xu, Meng Ding, Zebin Ma, Huanyi Xie, Youming Tao, Aicha Slaitane, Di Wang http://arxiv.org/abs/2602.16155 Real-world deployments routinely face distribution shifts, group imbalances, and adversarial perturbations, under which the traditional Empirical Risk Minimization (ERM) framework can degrade severely. Distributionally Robust Optimization (DRO) addresses this issue by optimizing the worst-case expected loss over an uncertainty set of distributions, offering a principled approach to robustness. Meanwhile, as training data in DRO always involves sensitive information, safeguarding it against leakage under Differential Privacy (DP) is essential. In contrast to classical DP-ERM, DP-DRO has received much less attention due to its minimax optimization structure with uncertainty constraint. To bridge the gap, we provide a comprehensive study of DP-(finite-sum)-DRO with $ψ$-divergence and non-convex loss. First, we study DRO with general $ψ$-divergence by reformulating it as a minimization problem, and develop a novel $(\varepsilon, δ)$-DP optimization method, called DP Double-Spider, tailored to this structure. Under mild assumptions, we show that it achieves a utility bound of $\mathcal{O}(\frac{1}{\sqrt{n}}+ (\frac{\sqrt{d \log (1/δ)}}{n \varepsilon})^{2/3})$ in terms of the gradient norm, where $n$ denotes the data size and $d$ denotes the model dimension. We further improve the utility rate for specific divergences. In particular, for DP-DRO with KL-divergence, by transforming the problem into a compositional finite-sum optimization problem, we develop a DP Recursive-Spider method and show that it achieves a utility bound of $\mathcal{O}((\frac{\sqrt{d \log(1/δ)}}{n\varepsilon})^{2/3} )$, matching the best-known result for non-convex DP-ERM. Experimentally, we demonstrate that our proposed methods outperform existing approaches for DP minimax optimization.$

Differentially Private Non-convex Distributionally Robust Optimization Difei Xu, Meng Ding, Zebin Ma, Huanyi Xie, Youming Tao, Aicha Slaitane, Di Wang http://arxiv.org/abs/2602.16155 Real-world deployments routinely face distribution shifts, group imbalances, and adversarial perturbations, under which the traditional Empirical Risk Minimization (ERM) framework can degrade severely. Distributionally Robust Optimization (DRO) addresses this issue by optimizing the worst-case expected loss over an uncertainty set of distributions, offering a principled approach to robustness. Meanwhile, as training data in DRO always involves sensitive information, safeguarding it against leakage under Differential Privacy (DP) is essential. In contrast to classical DP-ERM, DP-DRO has received much less attention due to its minimax optimization structure with uncertainty constraint. To bridge the gap, we provide a comprehensive study of DP-(finite-sum)-DRO with $ψ$-divergence and non-convex loss. First, we study DRO with general $ψ$-divergence by reformulating it as a minimization problem, and develop a novel $(\varepsilon, δ)$-DP optimization method, called DP Double-Spider, tailored to this structure. Under mild assumptions, we show that it achieves a utility bound of $\mathcal{O}(\frac{1}{\sqrt{n}}+ (\frac{\sqrt{d \log (1/δ)}}{n \varepsilon})^{2/3})$ in terms of the gradient norm, where $n$ denotes the data size and $d$ denotes the model dimension. We further improve the utility rate for specific divergences. In particular, for DP-DRO with KL-divergence, by transforming the problem into a compositional finite-sum optimization problem, we develop a DP Recursive-Spider method and show that it achieves a utility bound of $\mathcal{O}((\frac{\sqrt{d \log(1/δ)}}{n\varepsilon})^{2/3} )$, matching the best-known result for non-convex DP-ERM. Experimentally, we demonstrate that our proposed methods outperform existing approaches for DP minimax optimization.

Differentially Private Non-convex Distributionally Robust Optimization

Difei Xu, Meng Ding, Zebin Ma, Huanyi Xie, Youming Tao, Aicha Slaitane, Di Wang

http://arxiv.org/abs/2602.16155

19.02.2026 04:53 — 👍 1 🔁 0 💬 0 📌 0

$Learning with Locally Private Examples by Inverse Weierstrass Private Stochastic Gradient Descent Jean Dufraiche, Paul Mangold, Michaël Perrot, Marc Tommasi http://arxiv.org/abs/2602.16436 Releasing data once and for all under noninteractive Local Differential Privacy (LDP) enables complete data reusability, but the resulting noise may create bias in subsequent analyses. In this work, we leverage the Weierstrass transform to characterize this bias in binary classification. We prove that inverting this transform leads to a bias-correction method to compute unbiased estimates of nonlinear functions on examples released under LDP. We then build a novel stochastic gradient descent algorithm called Inverse Weierstrass Private SGD (IWP-SGD). It converges to the true population risk minimizer at a rate of $\mathcal{O}(1/n)$, with $n$ the number of examples. We empirically validate IWP-SGD on binary classification tasks using synthetic and real-world datasets.$

Learning with Locally Private Examples by Inverse Weierstrass Private Stochastic Gradient Descent Jean Dufraiche, Paul Mangold, Michaël Perrot, Marc Tommasi http://arxiv.org/abs/2602.16436 Releasing data once and for all under noninteractive Local Differential Privacy (LDP) enables complete data reusability, but the resulting noise may create bias in subsequent analyses. In this work, we leverage the Weierstrass transform to characterize this bias in binary classification. We prove that inverting this transform leads to a bias-correction method to compute unbiased estimates of nonlinear functions on examples released under LDP. We then build a novel stochastic gradient descent algorithm called Inverse Weierstrass Private SGD (IWP-SGD). It converges to the true population risk minimizer at a rate of $\mathcal{O}(1/n)$, with $n$ the number of examples. We empirically validate IWP-SGD on binary classification tasks using synthetic and real-world datasets.

Learning with Locally Private Examples by Inverse Weierstrass Private Stochastic Gradient Descent

Jean Dufraiche, Paul Mangold, Michaël Perrot, Marc Tommasi

http://arxiv.org/abs/2602.16436

19.02.2026 04:53 — 👍 0 🔁 0 💬 0 📌 0

Certified Per-Instance Unlearning Using Individual Sensitivity Bounds Hanna Benarroch, Jamal Atif, Olivier Cappé http://arxiv.org/abs/2602.15602 Certified machine unlearning can be achieved via noise injection leading to differential privacy guarantees, where noise is calibrated to worst-case sensitivity. Such conservative calibration often results in performance degradation, limiting practical applicability. In this work, we investigate an alternative approach based on adaptive per-instance noise calibration tailored to the individual contribution of each data point to the learned solution. This raises the following challenge: how can one establish formal unlearning guarantees when the mechanism depends on the specific point to be removed? To define individual data point sensitivities in noisy gradient dynamics, we consider the use of per-instance differential privacy. For ridge regression trained via Langevin dynamics, we derive high-probability per-instance sensitivity bounds, yielding certified unlearning with substantially less noise injection. We corroborate our theoretical findings through experiments in linear settings and provide further empirical evidence on the relevance of the approach in deep learning settings.

Certified Per-Instance Unlearning Using Individual Sensitivity Bounds

Hanna Benarroch, Jamal Atif, Olivier Cappé

http://arxiv.org/abs/2602.15602

18.02.2026 04:54 — 👍 0 🔁 0 💬 0 📌 0

Onto-DP: Constructing Neighborhoods for Differential Privacy on Ontological Databases Yasmine Hayder, Adrien Boiret, Cédric Eichler, Benjamin Nguyen http://arxiv.org/abs/2602.15614 In this paper, we investigate how attackers can discover sensitive information embedded within databases by exploiting inference rules. We demonstrate the inadequacy of naively applied existing state of the art differential privacy (DP) models in safeguarding against such attacks. We introduce ontology aware differential privacy (Onto-DP), a novel extension of differential privacy paradigms built on top of any classical DP model by enriching it with semantic awareness. We show that this extension is a sufficient condition to adequately protect against attackers aware of inference rules.

Onto-DP: Constructing Neighborhoods for Differential Privacy on Ontological Databases

Yasmine Hayder, Adrien Boiret, Cédric Eichler, Benjamin Nguyen

http://arxiv.org/abs/2602.15614

18.02.2026 04:53 — 👍 1 🔁 0 💬 0 📌 0

Local Node Differential Privacy Sofya Raskhodnikova, Adam Smith, Connor Wagaman, Anatoly Zavyalov http://arxiv.org/abs/2602.15802 We initiate an investigation of node differential privacy for graphs in the local model of private data analysis. In our model, dubbed LNDP, each node sees its own edge list and releases the output of a local randomizer on this input. These outputs are aggregated by an untrusted server to obtain a final output. We develop a novel algorithmic framework for this setting that allows us to accurately answer arbitrary linear queries on a blurry approximation of the input graph's degree distribution. For some natural problems, the resulting algorithms match the accuracy achievable with node privacy in the central model, where data are held and processed by a trusted server. We also prove lower bounds on the error required by LNDP that imply the optimality of our algorithms for several fundamental graph statistics. We then lift these lower bounds to the interactive LNDP setting, demonstrating the optimality of our algorithms even when constantly many rounds of interaction are permitted. Obtaining our lower bounds requires new approaches, since those developed for the usual local model do not apply to the inherently overlapping inputs that arise from graphs. Finally, we prove structural results that reveal qualitative differences between local node privacy and the standard local model for tabular data.

Local Node Differential Privacy

Sofya Raskhodnikova, Adam Smith, Connor Wagaman, Anatoly Zavyalov

http://arxiv.org/abs/2602.15802

18.02.2026 04:53 — 👍 1 🔁 0 💬 0 📌 1

Natural Privacy Filters Are Not Always Free: A Characterization of Free Natural Filters Matthew Regehr, Bingshan Hu, Ethan Leeman, Pasin Manurangsi, Pierre Tholoniat, Mathias Lécuyer http://arxiv.org/abs/2602.15815 We study natural privacy filters, which enable the exact composition of differentially private (DP) mechanisms with adaptively chosen privacy characteristics. Earlier privacy filters consider only simple privacy parameters such as Rényi-DP or Gaussian DP parameters. Natural filters account for the entire privacy profile of every query, promising greater utility for a given privacy budget. We show that, contrary to other forms of DP, natural privacy filters are not free in general. Indeed, we show that only families of privacy mechanisms that are well-ordered when composed admit free natural privacy filters.

18.02.2026 04:53 — 👍 0 🔁 0 💬 0 📌 0

$Differentially private graph coloring Michael Xie, Jiayi Wu, Dung Nguyen, Aravind Srinivasan http://arxiv.org/abs/2602.13460 Differential Privacy is the gold standard in privacy-preserving data analysis. This paper addresses the challenge of producing a differentially edge-private vertex coloring. In this paper, we present two novel algorithms to approach this problem. Both algorithms initially randomly colors each vertex from a fixed size palette, then applies the exponential mechanism to locally resample colors for either all or a chosen subset of the vertices. Any non-trivial differentially edge private coloring of graph needs to be defective. A coloring of a graph is k defective if all vertices of the graph share it's assigned color with at most k of its neighbors. This is the metric by which we will measure the utility of our algorithms. Our first algorithm applies to d-inductive graphs. Assume we have a d-inductive graph with n vertices and max degree $Δ$. We show that our algorithm provides a $3ε$-differentially private coloring with $O(\frac{\log n}ε+d)$ max defectiveness, given a palette of size $Θ(\fracΔ{\log n}+\frac{1}ε)$ Furthermore, we show that this algorithm can generalize to $O(\fracΔ{cε}+d)$ defectiveness, where c is the size of the palette and $c=O(\fracΔ{\log n})$. Our second algorithm utilizes noisy thresholding to guarantee $O(\frac{\log n}ε)$ max defectiveness, given a palette of size $Θ(\fracΔ{\log n}+\frac{1}ε)$, generalizing to all graphs rather than just d-inductive ones.$

Differentially private graph coloring Michael Xie, Jiayi Wu, Dung Nguyen, Aravind Srinivasan http://arxiv.org/abs/2602.13460 Differential Privacy is the gold standard in privacy-preserving data analysis. This paper addresses the challenge of producing a differentially edge-private vertex coloring. In this paper, we present two novel algorithms to approach this problem. Both algorithms initially randomly colors each vertex from a fixed size palette, then applies the exponential mechanism to locally resample colors for either all or a chosen subset of the vertices. Any non-trivial differentially edge private coloring of graph needs to be defective. A coloring of a graph is k defective if all vertices of the graph share it's assigned color with at most k of its neighbors. This is the metric by which we will measure the utility of our algorithms. Our first algorithm applies to d-inductive graphs. Assume we have a d-inductive graph with n vertices and max degree $Δ$. We show that our algorithm provides a $3ε$-differentially private coloring with $O(\frac{\log n}ε+d)$ max defectiveness, given a palette of size $Θ(\fracΔ{\log n}+\frac{1}ε)$ Furthermore, we show that this algorithm can generalize to $O(\fracΔ{cε}+d)$ defectiveness, where c is the size of the palette and $c=O(\fracΔ{\log n})$. Our second algorithm utilizes noisy thresholding to guarantee $O(\frac{\log n}ε)$ max defectiveness, given a palette of size $Θ(\fracΔ{\log n}+\frac{1}ε)$, generalizing to all graphs rather than just d-inductive ones.

Differentially private graph coloring

Michael Xie, Jiayi Wu, Dung Nguyen, Aravind Srinivasan

http://arxiv.org/abs/2602.13460

17.02.2026 04:55 — 👍 0 🔁 0 💬 0 📌 0

Locally Private Parametric Methods for Change-Point Detection Anuj Kumar Yadav, Cemre Cadir, Yanina Shkel, Michael Gastpar http://arxiv.org/abs/2602.13619 We study parametric change-point detection, where the goal is to identify distributional changes in time series, under local differential privacy. In the non-private setting, we derive improved finite-sample accuracy guarantees for a change-point detection algorithm based on the generalized log-likelihood ratio test, via martingale methods. In the private setting, we propose two locally differentially private algorithms based on randomized response and binary mechanisms, and analyze their theoretical performance. We derive bounds on detection accuracy and validate our results through empirical evaluation. Our results characterize the statistical cost of local differential privacy in change-point detection and show how privacy degrades performance relative to a non-private benchmark. As part of this analysis, we establish a structural result for strong data processing inequalities (SDPI), proving that SDPI coefficients for Rényi divergences and their symmetric variants (Jeffreys-Rényi divergences) are achieved by binary input distributions. These results on SDPI coefficients are also of independent interest, with applications to statistical estimation, data compression, and Markov chain mixing.

Locally Private Parametric Methods for Change-Point Detection

Anuj Kumar Yadav, Cemre Cadir, Yanina Shkel, Michael Gastpar

http://arxiv.org/abs/2602.13619

17.02.2026 04:55 — 👍 1 🔁 0 💬 0 📌 0

Differentially Private Retrieval-Augmented Generation Tingting Tang, James Flemings, Yongqin Wang, Murali Annavaram http://arxiv.org/abs/2602.14374 Retrieval-augmented generation (RAG) is a widely used framework for reducing hallucinations in large language models (LLMs) on domain-specific tasks by retrieving relevant documents from a database to support accurate responses. However, when the database contains sensitive corpora, such as medical records or legal documents, RAG poses serious privacy risks by potentially exposing private information through its outputs. Prior work has demonstrated that one can practically craft adversarial prompts that force an LLM to regurgitate the augmented contexts. A promising direction is to integrate differential privacy (DP), a privacy notion that offers strong formal guarantees, into RAG systems. However, naively applying DP mechanisms into existing systems often leads to significant utility degradation. Particularly for RAG systems, DP can reduce the usefulness of the augmented contexts leading to increase risk of hallucination from the LLMs. Motivated by these challenges, we present DP-KSA, a novel privacy-preserving RAG algorithm that integrates DP using the propose-test-release paradigm. DP-KSA follows from a key observation that most question-answering (QA) queries can be sufficiently answered with a few keywords. Hence, DP-KSA first obtains an ensemble of relevant contexts, each of which will be used to generate a response from an LLM. We utilize these responses to obtain the most frequent keywords in a differentially private manner. Lastly, the keywords are augmented into the prompt for the final output. This approach effectively compresses the semantic space while preserving both utility and privacy. We formally show that DP-KSA provides formal DP guarantees on the generated output with respect to the RAG database. We evaluate DP-KSA on two QA benchmarks using three instruction-tuned LLMs, and our empirical results d

Differentially Private Retrieval-Augmented Generation

Tingting Tang, James Flemings, Yongqin Wang, Murali Annavaram

http://arxiv.org/abs/2602.14374

17.02.2026 04:55 — 👍 0 🔁 0 💬 0 📌 0

Differentially Private Two-Stage Empirical Risk Minimization and Applications to Individualized Treatment Rule Joowon Lee, Guanhua Chen http://arxiv.org/abs/2602.12604 Differential Privacy (DP) provides a rigorous framework for deriving privacy-preserving estimators by injecting calibrated noise to mask individual contributions while preserving population-level insights. Its central challenge lies in the privacy-utility trade-off: calibrating noise levels to ensure robust protection without compromising statistical performance. Standard DP methods struggle with a particular class of two-stage problems prevalent in individualized treatment rules (ITRs) and causal inference. In these settings, data-dependent weights are first computed to satisfy distributional constraints, such as covariate balance, before the final parameter of interest is estimated. Current DP approaches often privatize stages independently, which either degrades weight efficacy-leading to biased and inconsistent estimates-or introduces excessive noise to account for worst-case scenarios. To address these challenges, we propose the Differentially Private Two-Stage Empirical Risk Minimization (DP-2ERM), a framework that injects a carefully calibrated noise only into the second stage while maintaining privacy for the entire pipeline and preserving the integrity of the first stage weights. Our theoretical contributions include deterministic bounds on weight perturbations across various widely used weighting methods, and probabilistic bounds on sensitivity for the final estimator. Simulations and real-world applications in ITR demonstrate that DP-2ERM significantly enhances utility over existing methods while providing rigorous privacy guarantees.

Differentially Private Two-Stage Empirical Risk Minimization and Applications to Individualized Treatment Rule

Joowon Lee, Guanhua Chen

http://arxiv.org/abs/2602.12604

16.02.2026 04:53 — 👍 0 🔁 0 💬 0 📌 0

Neighborhood Blending: A Lightweight Inference-Time Defense Against Membership Inference Attacks Osama Zafar, Shaojie Zhan, Tianxi Ji, Erman Ayday http://arxiv.org/abs/2602.12943 In recent years, the widespread adoption of Machine Learning as a Service (MLaaS), particularly in sensitive environments, has raised considerable privacy concerns. Of particular importance are membership inference attacks (MIAs), which exploit behavioral discrepancies between training and non-training data to determine whether a specific record was included in the model's training set, thereby presenting significant privacy risks. Although existing defenses, such as adversarial regularization, DP-SGD, and MemGuard, assist in mitigating these threats, they often entail trade-offs such as compromising utility, increased computational requirements, or inconsistent protection against diverse attack vectors. In this paper, we introduce a novel inference-time defense mechanism called Neighborhood Blending, which mitigates MIAs without retraining the model or incurring significant computational overhead. Our approach operates post-training by smoothing the model's confidence outputs based on the neighborhood of a queried sample. By averaging predictions from similar training samples selected using differentially private sampling, our method establishes a consistent confidence pattern, rendering members and non-members indistinguishable to an adversary while maintaining high utility. Significantly, Neighborhood Blending maintains label integrity (zero label loss) and ensures high utility through an adaptive, "pay-as-you-go" distortion strategy. It is a model-agnostic approach that offers a practical, lightweight solution that enhances privacy without sacrificing model utility. Through extensive experiments across diverse datasets and models, we demonstrate that our defense significantly reduces MIA success rates while preserving model performance, outperforming existing post-hoc defenses like

Neighborhood Blending: A Lightweight Inference-Time Defense Against Membership Inference Attacks

Osama Zafar, Shaojie Zhan, Tianxi Ji, Erman Ayday

http://arxiv.org/abs/2602.12943

16.02.2026 04:53 — 👍 0 🔁 0 💬 0 📌 0

$Adaptive Power Iteration Method for Differentially Private PCA Ta Duy Nguyem, Alina Ene, Huy Le Nguyen http://arxiv.org/abs/2602.11454 We study $(ε,δ)$-differentially private algorithms for the problem of approximately computing the top singular vector of a matrix $A\in\mathbb{R}^{n\times d}$ where each row of $A$ is a datapoint in $\mathbb{R}^{d}$. In our privacy model, neighboring inputs differ by one single row/datapoint. We study the private variant of the power iteration method, which is widely adopted in practice. Our algorithm is based on a filtering technique which adapts to the coherence parameter of the input matrix. This technique provides a utility that goes beyond the worst-case guarantees for matrices with low coherence parameter. Our work departs from and complements the work by Hardt-Roth (STOC 2013) which designed a private power iteration method for the privacy model where neighboring inputs differ in one single entry by at most 1.$

Adaptive Power Iteration Method for Differentially Private PCA Ta Duy Nguyem, Alina Ene, Huy Le Nguyen http://arxiv.org/abs/2602.11454 We study $(ε,δ)$-differentially private algorithms for the problem of approximately computing the top singular vector of a matrix $A\in\mathbb{R}^{n\times d}$ where each row of $A$ is a datapoint in $\mathbb{R}^{d}$. In our privacy model, neighboring inputs differ by one single row/datapoint. We study the private variant of the power iteration method, which is widely adopted in practice. Our algorithm is based on a filtering technique which adapts to the coherence parameter of the input matrix. This technique provides a utility that goes beyond the worst-case guarantees for matrices with low coherence parameter. Our work departs from and complements the work by Hardt-Roth (STOC 2013) which designed a private power iteration method for the privacy model where neighboring inputs differ in one single entry by at most 1.

Adaptive Power Iteration Method for Differentially Private PCA

Ta Duy Nguyem, Alina Ene, Huy Le Nguyen

http://arxiv.org/abs/2602.11454

13.02.2026 04:54 — 👍 0 🔁 0 💬 0 📌 0

$Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt Yujie Gu, Richeng Jin, Xiaoyu Ji, Yier Jin, Wenyuan Xu http://arxiv.org/abs/2602.11513 Large Language Models (LLMs) have achieved remarkable performance and received significant research interest. The enormous computational demands, however, hinder the local deployment on devices with limited resources. The current prevalent LLM inference paradigms require users to send queries to the service providers for processing, which raises critical privacy concerns. Existing approaches propose to allow the users to obfuscate the token embeddings before transmission and utilize local models for denoising. Nonetheless, transmitting the token embeddings and deploying local models may result in excessive communication and computation overhead, preventing practical implementation. In this work, we propose \textbf{DEL}, a framework for \textbf{D}ifferentially private and communication \textbf{E}fficient \textbf{L}LM split inference. More specifically, an embedding projection module and a differentially private stochastic quantization mechanism are proposed to reduce the communication overhead in a privacy-preserving manner. To eliminate the need for local models, we adapt soft prompt at the server side to compensate for the utility degradation caused by privacy. To the best of our knowledge, this is the first work that utilizes soft prompt to improve the trade-off between privacy and utility in LLM inference, and extensive experiments on text generation and natural language understanding benchmarks demonstrate the effectiveness of the proposed method.$

Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt Yujie Gu, Richeng Jin, Xiaoyu Ji, Yier Jin, Wenyuan Xu http://arxiv.org/abs/2602.11513 Large Language Models (LLMs) have achieved remarkable performance and received significant research interest. The enormous computational demands, however, hinder the local deployment on devices with limited resources. The current prevalent LLM inference paradigms require users to send queries to the service providers for processing, which raises critical privacy concerns. Existing approaches propose to allow the users to obfuscate the token embeddings before transmission and utilize local models for denoising. Nonetheless, transmitting the token embeddings and deploying local models may result in excessive communication and computation overhead, preventing practical implementation. In this work, we propose \textbf{DEL}, a framework for \textbf{D}ifferentially private and communication \textbf{E}fficient \textbf{L}LM split inference. More specifically, an embedding projection module and a differentially private stochastic quantization mechanism are proposed to reduce the communication overhead in a privacy-preserving manner. To eliminate the need for local models, we adapt soft prompt at the server side to compensate for the utility degradation caused by privacy. To the best of our knowledge, this is the first work that utilizes soft prompt to improve the trade-off between privacy and utility in LLM inference, and extensive experiments on text generation and natural language understanding benchmarks demonstrate the effectiveness of the proposed method.

13.02.2026 04:54 — 👍 0 🔁 0 💬 0 📌 0

Differentially Private Perturbed Push-Sum Protocol and Its Application in Non-Convex Optimization Yiming Zhou, Kaiping Xue, Enhong Chen http://arxiv.org/abs/2602.11544 In decentralized networks, nodes cannot ensure that their shared information will be securely preserved by their neighbors, making privacy vulnerable to inference by curious nodes. Adding calibrated random noise before communication to satisfy differential privacy offers a proven defense; however, most existing methods are tailored to specific downstream tasks and lack a general, protocol-level privacy-preserving solution. To bridge this gap, we propose Differentially Private Perturbed Push-Sum (DPPS), a lightweight differential privacy protocol for decentralized communication. Since protocol-level differential privacy introduces the unique challenge of obtaining the sensitivity for each communication round, DPPS introduces a novel sensitivity estimation mechanism that requires each node to compute and broadcast only one scalar per round, enabling rigorous differential privacy guarantees. This design allows DPPS to serve as a plug-and-play, low-cost privacy-preserving solution for downstream applications built on it. To provide a concrete instantiation of DPPS and better balance the privacy-utility trade-off, we design PartPSP, a privacy-preserving decentralized algorithm for non-convex optimization that integrates a partial communication mechanism. By partitioning model parameters into local and shared components and applying DPPS only to the shared parameters, PartPSP reduces the dimensionality of consensus data, thereby lowering the magnitude of injected noise and improving optimization performance. We theoretically prove that PartPSP converges under non-convex objectives and, with partial communication, achieves better optimization performance under the same privacy budget. Experimental results validate the effectiveness of DPPS's privacy-preserving and demonstrate that PartPSP outperforms exis

Differentially Private Perturbed Push-Sum Protocol and Its Application in Non-Convex Optimization

Yiming Zhou, Kaiping Xue, Enhong Chen

http://arxiv.org/abs/2602.11544

13.02.2026 04:53 — 👍 0 🔁 0 💬 0 📌 0

On the Sensitivity of Firing Rate-Based Federated Spiking Neural Networks to Differential Privacy Luiz Pereira, Mirko Perkusich, Dalton Valadares, Kyller Gorgônio http://arxiv.org/abs/2602.12009 Federated Neuromorphic Learning (FNL) enables energy-efficient and privacy-preserving learning on devices without centralizing data. However, real-world deployments require additional privacy mechanisms that can significantly alter training signals. This paper analyzes how Differential Privacy (DP) mechanisms, specifically gradient clipping and noise injection, perturb firing-rate statistics in Spiking Neural Networks (SNNs) and how these perturbations are propagated to rate-based FNL coordination. On a speech recognition task under non-IID settings, ablations across privacy budgets and clipping bounds reveal systematic rate shifts, attenuated aggregation, and ranking instability during client selection. Moreover, we relate these shifts to sparsity and memory indicators. Our findings provide actionable guidance for privacy-preserving FNL, specifically regarding the balance between privacy strength and rate-dependent coordination.

On the Sensitivity of Firing Rate-Based Federated Spiking Neural Networks to Differential Privacy

Luiz Pereira, Mirko Perkusich, Dalton Valadares, Kyller Gorgônio

http://arxiv.org/abs/2602.12009

13.02.2026 04:53 — 👍 1 🔁 0 💬 0 📌 0

$Keeping a Secret Requires a Good Memory: Space Lower-Bounds for Private Algorithms Alessandro Epasto, Xin Lyu, Pasin Manurangsi http://arxiv.org/abs/2602.12209 We study the computational cost of differential privacy in terms of memory efficiency. While the trade-off between accuracy and differential privacy is well-understood, the inherent cost of privacy regarding memory use remains largely unexplored. This paper establishes for the first time an unconditional space lower bound for user-level differential privacy by introducing a novel proof technique based on a multi-player communication game. Central to our approach, this game formally links the hardness of low-memory private algorithms to the necessity of ``contribution capping'' -- tracking and limiting the users who disproportionately impact the dataset. We demonstrate that winning this communication game requires transmitting information proportional to the number of over-active users, which translates directly to memory lower bounds. We apply this framework, as an example, to the fundamental problem of estimating the number of distinct elements in a stream and we prove that any private algorithm requires almost $\widetildeΩ(T^{1/3})$ space to achieve certain error rates in a promise variant of the problem. This resolves an open problem in the literature (by Jain et al. NeurIPS 2023 and Cummings et al. ICML 2025) and establishes the first exponential separation between the space complexity of private algorithms and their non-private $\widetilde{O}(1)$ counterparts for a natural statistical estimation task. Furthermore, we show that this communication-theoretic technique generalizes to broad classes of problems, yielding lower bounds for private medians, quantiles, and max-select.$

Keeping a Secret Requires a Good Memory: Space Lower-Bounds for Private Algorithms Alessandro Epasto, Xin Lyu, Pasin Manurangsi http://arxiv.org/abs/2602.12209 We study the computational cost of differential privacy in terms of memory efficiency. While the trade-off between accuracy and differential privacy is well-understood, the inherent cost of privacy regarding memory use remains largely unexplored. This paper establishes for the first time an unconditional space lower bound for user-level differential privacy by introducing a novel proof technique based on a multi-player communication game. Central to our approach, this game formally links the hardness of low-memory private algorithms to the necessity of ``contribution capping'' -- tracking and limiting the users who disproportionately impact the dataset. We demonstrate that winning this communication game requires transmitting information proportional to the number of over-active users, which translates directly to memory lower bounds. We apply this framework, as an example, to the fundamental problem of estimating the number of distinct elements in a stream and we prove that any private algorithm requires almost $\widetildeΩ(T^{1/3})$ space to achieve certain error rates in a promise variant of the problem. This resolves an open problem in the literature (by Jain et al. NeurIPS 2023 and Cummings et al. ICML 2025) and establishes the first exponential separation between the space complexity of private algorithms and their non-private $\widetilde{O}(1)$ counterparts for a natural statistical estimation task. Furthermore, we show that this communication-theoretic technique generalizes to broad classes of problems, yielding lower bounds for private medians, quantiles, and max-select.

Keeping a Secret Requires a Good Memory: Space Lower-Bounds for Private Algorithms

Alessandro Epasto, Xin Lyu, Pasin Manurangsi

http://arxiv.org/abs/2602.12209

13.02.2026 04:53 — 👍 1 🔁 0 💬 0 📌 0

$PRISM: Differentially Private Synthetic Data with Structure-Aware Budget Allocation for Prediction Amir Asiaee, Chao Yan, Zachary B. Abrams, Bradley A. Malin http://arxiv.org/abs/2602.10228 Differential privacy (DP) provides a mathematical guarantee limiting what an adversary can learn about any individual from released data. However, achieving this protection typically requires adding noise, and noise can accumulate when many statistics are measured. Existing DP synthetic data methods treat all features symmetrically, spreading noise uniformly even when the data will serve a specific prediction task. We develop a prediction-centric approach operating in three regimes depending on available structural knowledge. In the causal regime, when the causal parents of $Y$ are known and distribution shift is expected, we target the parents for robustness. In the graphical regime, when a Bayesian network structure is available and the distribution is stable, the Markov blanket of $Y$ provides a sufficient feature set for optimal prediction. In the predictive regime, when no structural knowledge exists, we select features via differentially private methods without claiming to recover causal or graphical structure. We formalize this as PRISM, a mechanism that (i) identifies a predictive feature subset according to the appropriate regime, (ii) constructs targeted summary statistics, (iii) allocates budget to minimize an upper bound on prediction error, and (iv) synthesizes data via graphical-model inference. We prove end-to-end privacy guarantees and risk bounds. Empirically, task-aware allocation improves prediction accuracy compared to generic synthesizers. Under distribution shift, targeting causal parents achieves AUC $\approx 0.73$ while correlation-based selection collapses to chance ($\approx 0.49$).$

PRISM: Differentially Private Synthetic Data with Structure-Aware Budget Allocation for Prediction Amir Asiaee, Chao Yan, Zachary B. Abrams, Bradley A. Malin http://arxiv.org/abs/2602.10228 Differential privacy (DP) provides a mathematical guarantee limiting what an adversary can learn about any individual from released data. However, achieving this protection typically requires adding noise, and noise can accumulate when many statistics are measured. Existing DP synthetic data methods treat all features symmetrically, spreading noise uniformly even when the data will serve a specific prediction task. We develop a prediction-centric approach operating in three regimes depending on available structural knowledge. In the causal regime, when the causal parents of $Y$ are known and distribution shift is expected, we target the parents for robustness. In the graphical regime, when a Bayesian network structure is available and the distribution is stable, the Markov blanket of $Y$ provides a sufficient feature set for optimal prediction. In the predictive regime, when no structural knowledge exists, we select features via differentially private methods without claiming to recover causal or graphical structure. We formalize this as PRISM, a mechanism that (i) identifies a predictive feature subset according to the appropriate regime, (ii) constructs targeted summary statistics, (iii) allocates budget to minimize an upper bound on prediction error, and (iv) synthesizes data via graphical-model inference. We prove end-to-end privacy guarantees and risk bounds. Empirically, task-aware allocation improves prediction accuracy compared to generic synthesizers. Under distribution shift, targeting causal parents achieves AUC $\approx 0.73$ while correlation-based selection collapses to chance ($\approx 0.49$).

PRISM: Differentially Private Synthetic Data with Structure-Aware Budget Allocation for Prediction

Amir Asiaee, Chao Yan, Zachary B. Abrams, Bradley A. Malin

http://arxiv.org/abs/2602.10228

12.02.2026 04:55 — 👍 0 🔁 0 💬 0 📌 0

Risk-Equalized Differentially Private Synthetic Data: Protecting Outliers by Controlling Record-Level Influence Amir Asiaee, Chao Yan, Zachary B. Abrams, Bradley A. Malin http://arxiv.org/abs/2602.10232 When synthetic data is released, some individuals are harder to protect than others. A patient with a rare disease combination or a transaction with unusual characteristics stands out from the crowd. Differential privacy provides worst-case guarantees, but empirical attacks -- particularly membership inference -- succeed far more often against such outliers, especially under moderate privacy budgets and with auxiliary information. This paper introduces risk-equalized DP synthesis, a framework that prioritizes protection for high-risk records by reducing their influence on the learned generator. The mechanism operates in two stages: first, a small privacy budget estimates each record's "outlierness"; second, a DP learning procedure weights each record inversely to its risk score. Under Gaussian mechanisms, a record's privacy loss is proportional to its influence on the output -- so deliberately shrinking outliers' contributions yields tighter per-instance privacy bounds for precisely those records that need them most. We prove end-to-end DP guarantees via composition and derive closed-form per-record bounds for the synthesis stage (the scoring stage adds a uniform per-record term). Experiments on simulated data with controlled outlier injection show that risk-weighting substantially reduces membership inference success against high-outlierness records; ablations confirm that targeting -- not random downweighting -- drives the improvement. On real-world benchmarks (Breast Cancer, Adult, German Credit), gains are dataset-dependent, highlighting the interplay between scorer quality and synthesis pipeline.

12.02.2026 04:55 — 👍 0 🔁 0 💬 0 📌 0

$Skirting Additive Error Barriers for Private Turnstile Streams Anders Aamand, Justin Y. Chen, Sandeep Silwal http://arxiv.org/abs/2602.10360 We study differentially private continual release of the number of distinct items in a turnstile stream, where items may be both inserted and deleted. A recent work of Jain, Kalemaj, Raskhodnikova, Sivakumar, and Smith (NeurIPS '23) shows that for streams of length $T$, polynomial additive error of $Ω(T^{1/4})$ is necessary, even without any space restrictions. We show that this additive error lower bound can be circumvented if the algorithm is allowed to output estimates with both additive \emph{and multiplicative} error. We give an algorithm for the continual release of the number of distinct elements with $\text{polylog} (T)$ multiplicative and $\text{polylog}(T)$ additive error. We also show a qualitatively similar phenomenon for estimating the $F_2$ moment of a turnstile stream, where we can obtain $1+o(1)$ multiplicative and $\text{polylog} (T)$ additive error. Both results can be achieved using polylogarithmic space whereas prior approaches use polynomial space. In the sublinear space regime, some multiplicative error is necessary even if privacy is not a consideration. We raise several open questions aimed at better understanding trade-offs between multiplicative and additive error in private continual release.$

Skirting Additive Error Barriers for Private Turnstile Streams Anders Aamand, Justin Y. Chen, Sandeep Silwal http://arxiv.org/abs/2602.10360 We study differentially private continual release of the number of distinct items in a turnstile stream, where items may be both inserted and deleted. A recent work of Jain, Kalemaj, Raskhodnikova, Sivakumar, and Smith (NeurIPS '23) shows that for streams of length $T$, polynomial additive error of $Ω(T^{1/4})$ is necessary, even without any space restrictions. We show that this additive error lower bound can be circumvented if the algorithm is allowed to output estimates with both additive \emph{and multiplicative} error. We give an algorithm for the continual release of the number of distinct elements with $\text{polylog} (T)$ multiplicative and $\text{polylog}(T)$ additive error. We also show a qualitatively similar phenomenon for estimating the $F_2$ moment of a turnstile stream, where we can obtain $1+o(1)$ multiplicative and $\text{polylog} (T)$ additive error. Both results can be achieved using polylogarithmic space whereas prior approaches use polynomial space. In the sublinear space regime, some multiplicative error is necessary even if privacy is not a consideration. We raise several open questions aimed at better understanding trade-offs between multiplicative and additive error in private continual release.

Skirting Additive Error Barriers for Private Turnstile Streams

Anders Aamand, Justin Y. Chen, Sandeep Silwal

http://arxiv.org/abs/2602.10360

12.02.2026 04:55 — 👍 1 🔁 0 💬 0 📌 0

When Gradient Clipping Becomes a Control Mechanism for Differential Privacy in Deep Learning Mohammad Partohaghighi, Roummel Marcia, Bruce J. West, YangQuan Chen http://arxiv.org/abs/2602.10584 Privacy-preserving training on sensitive data commonly relies on differentially private stochastic optimization with gradient clipping and Gaussian noise. The clipping threshold is a critical control knob: if set too small, systematic over-clipping induces optimization bias; if too large, injected noise dominates updates and degrades accuracy. Existing adaptive clipping methods often depend on per-example gradient norm statistics, adding computational overhead and introducing sensitivity to datasets and architectures. We propose a control-driven clipping strategy that adapts the threshold using a lightweight, weight-only spectral diagnostic computed from model parameters. At periodic probe steps, the method analyzes a designated weight matrix via spectral decomposition and estimates a heavy-tailed spectral indicator associated with training stability. This indicator is smoothed over time and fed into a bounded feedback controller that updates the clipping threshold multiplicatively in the log domain. Because the controller uses only parameters produced during privacy-preserving training, the resulting threshold updates are post-processing and do not increase privacy loss beyond that of the underlying DP optimizer under standard composition accounting.

When Gradient Clipping Becomes a Control Mechanism for Differential Privacy in Deep Learning

Mohammad Partohaghighi, Roummel Marcia, Bruce J. West, YangQuan Chen

http://arxiv.org/abs/2602.10584

12.02.2026 04:55 — 👍 0 🔁 0 💬 0 📌 0

Latest posts by dppapers.bsky.social on Bluesky