This paper enhances the classical Solow model of economic growth by integrating L\'evy noise, a type of non-Gaussian stochastic perturbation, to capture the inherent uncertainties in economic systems. The extended model examines the impact of these random fluctuations on capital stock and output, revealing the role of jump-diffusion processes in long-term GDP fluctuations. Both continuous and discrete-time frameworks are analyzed to assess the implications for forecasting economic growth and understanding business cycles. The study compares deterministic and stochastic scenarios, providing insight into the stability of equilibrium points and the dynamics of economies subjected to random disturbances. Numerical simulations demonstrate how stochastic noise contributes to economic volatility, leading to abrupt shifts and bifurcations in growth trajectories. This research offers a comprehensive perspective on the influence of external shocks, presenting a more realistic depiction of economic development in uncertain environments.
arXivππ€
Stochastic bifurcation in economic growth model driven by L\'evy noise
By Abebe, Yuanb, Tesfay et al
04.02.2026 03:49 β π 0 π 0 π¬ 0 π 0
This paper enhances the classical Solow model of economic growth by integrating L\'evy noise, a type of non-Gaussian stochastic perturbation, to capture the inherent uncertainties in economic systems. The extended model examines the impact of these random fluctuations on capital stock and output, revealing the role of jump-diffusion processes in long-term GDP fluctuations. Both continuous and discrete-time frameworks are analyzed to assess the implications for forecasting economic growth and understanding business cycles. The study compares deterministic and stochastic scenarios, providing insight into the stability of equilibrium points and the dynamics of economies subjected to random disturbances. Numerical simulations demonstrate how stochastic noise contributes to economic volatility, leading to abrupt shifts and bifurcations in growth trajectories. This research offers a comprehensive perspective on the influence of external shocks, presenting a more realistic depiction of economic development in uncertain environments.
arXivππ€
Stochastic bifurcation in economic growth model driven by L\'evy noise
By Abebe, Yuanb, Tesfay et al
04.02.2026 01:37 β π 0 π 0 π¬ 0 π 0
This paper enhances the classical Solow model of economic growth by integrating L\'evy noise, a type of non-Gaussian stochastic perturbation, to capture the inherent uncertainties in economic systems. The extended model examines the impact of these random fluctuations on capital stock and output, revealing the role of jump-diffusion processes in long-term GDP fluctuations. Both continuous and discrete-time frameworks are analyzed to assess the implications for forecasting economic growth and understanding business cycles. The study compares deterministic and stochastic scenarios, providing insight into the stability of equilibrium points and the dynamics of economies subjected to random disturbances. Numerical simulations demonstrate how stochastic noise contributes to economic volatility, leading to abrupt shifts and bifurcations in growth trajectories. This research offers a comprehensive perspective on the influence of external shocks, presenting a more realistic depiction of economic development in uncertain environments.
arXivππ€
Stochastic bifurcation in economic growth model driven by L\'evy noise
By Abebe, Yuanb, Tesfay et al
03.02.2026 22:10 β π 0 π 0 π¬ 0 π 0
This paper enhances the classical Solow model of economic growth by integrating L\'evy noise, a type of non-Gaussian stochastic perturbation, to capture the inherent uncertainties in economic systems. The extended model examines the impact of these random fluctuations on capital stock and output, revealing the role of jump-diffusion processes in long-term GDP fluctuations. Both continuous and discrete-time frameworks are analyzed to assess the implications for forecasting economic growth and understanding business cycles. The study compares deterministic and stochastic scenarios, providing insight into the stability of equilibrium points and the dynamics of economies subjected to random disturbances. Numerical simulations demonstrate how stochastic noise contributes to economic volatility, leading to abrupt shifts and bifurcations in growth trajectories. This research offers a comprehensive perspective on the influence of external shocks, presenting a more realistic depiction of economic development in uncertain environments.
arXivππ€
Stochastic bifurcation in economic growth model driven by L\'evy noise
By Abebe, Yuanb, Tesfay et al
03.02.2026 19:21 β π 0 π 0 π¬ 0 π 0
This paper enhances the classical Solow model of economic growth by integrating L\'evy noise, a type of non-Gaussian stochastic perturbation, to capture the inherent uncertainties in economic systems. The extended model examines the impact of these random fluctuations on capital stock and output, revealing the role of jump-diffusion processes in long-term GDP fluctuations. Both continuous and discrete-time frameworks are analyzed to assess the implications for forecasting economic growth and understanding business cycles. The study compares deterministic and stochastic scenarios, providing insight into the stability of equilibrium points and the dynamics of economies subjected to random disturbances. Numerical simulations demonstrate how stochastic noise contributes to economic volatility, leading to abrupt shifts and bifurcations in growth trajectories. This research offers a comprehensive perspective on the influence of external shocks, presenting a more realistic depiction of economic development in uncertain environments.
arXivππ€
Stochastic bifurcation in economic growth model driven by L\'evy noise
By Abebe, Yuanb, Tesfay et al
03.02.2026 16:20 β π 0 π 0 π¬ 0 π 0
The use of the non-parametric Restricted Mean Survival Time endpoint (RMST)
has grown in popularity as trialists look to analyse time-to-event outcomes
without the restrictions of the proportional hazards assumption. In this paper,
we evaluate the power and type I error rate of the parametric and
non-parametric RMST estimators when treatment effect is explained by multiple
covariates, including an interaction term. Utilising the RMST estimator in this
way allows the combined treatment effect to be summarised as a one-dimensional
estimator, which is evaluated using a one-sided hypothesis Z-test. The
estimators are either fully specified or misspecified, both in terms of
unaccounted covariates or misspecified knot points (where trials exhibit
crossing survival curves). A placebo-controlled trial of Gamma interferon is
used as a motivating example to simulate associated survival times. When
correctly specified, the parametric RMST estimator has the greatest power,
regardless of the time of analysis. The misspecified RMST estimator generally
performs similarly when covariates mirror those of the fitted case study
dataset. However, as the magnitude of the unaccounted covariate increases, the
associated power of the estimator decreases. In all cases, the non-parametric
RMST estimator has the lowest power, and power remains very reliant on the time
of analysis (with a later analysis time correlated with greater power).
arXivππ€
The use of restricted mean survival time to estimate treatment effect under model misspecification, a simulation study
By
03.02.2026 03:50 β π 0 π 0 π¬ 0 π 0
arXiv:2310.09597v2 Announce Type: replace
Abstract: We consider the problem of repeatedly choosing policies to maximize social welfare. Welfare is a weighted sum of private utility and public revenue. Earlier outcomes inform later policies. Utility is not observed, but indirectly inferred. Response functions are learned through experimentation. We derive a lower bound on regret, and a matching adversarial upper bound for a variant of the Exp3 algorithm. Cumulative regret grows at a rate of $T^{2/3}$. This implies that (i) welfare maximization is harder than the multi-armed bandit problem (with a rate of $T^{1/2}$ for finite policy sets), and (ii) our algorithm achieves the optimal rate. For the stochastic setting, if social welfare is concave, we can achieve a rate of $T^{1/2}$ (for continuous policy sets), using a dyadic search algorithm. We analyze an extension to nonlinear income taxation, and sketch an extension to commodity taxation. We compare our setting to monopoly pricing (which is easier), and price setting for bilateral trade (which is harder).
arXivππ€
Adaptive maximization of social welfare
By
03.02.2026 01:37 β π 0 π 0 π¬ 0 π 0
We study a sequential mechanism design problem in which a principal seeks to elicit truthful reports from multiple rational agents while starting with no prior knowledge of agents' beliefs. We introduce Distributionally Robust Adaptive Mechanism (DRAM), a general framework combining insights from both mechanism design and online learning to jointly address truthfulness and cost-optimality. Throughout the sequential game, the mechanism estimates agents' beliefs and iteratively updates a distributionally robust linear program with shrinking ambiguity sets to reduce payments while preserving truthfulness. Our mechanism guarantees truthful reporting with high probability while achieving $\tilde{O}(\sqrt{T})$ cumulative regret, and we establish a matching lower bound showing that no truthful adaptive mechanism can asymptotically do better. The framework generalizes to plug-in estimators, supporting structured priors and delayed feedback. To our knowledge, this is the first adaptive mechanism under general settings that maintains truthfulness and achieves optimal regret when incentive constraints are unknown and must be learned.
arXivππ€
Multi-agent Adaptive Mechanism Design
By Han, Simchi-Levi, Tan et al
02.02.2026 16:41 β π 1 π 1 π¬ 0 π 0
The global aid system functions as a complex and evolving ecosystem; yet widespread understanding of its structure remains largely limited to aggregate volume flows. Here we map the network topology of global aid using a dataset of unprecedented scale: over 10 million transaction records connecting 2,456 publishing organisations across 230 countries between 1967 and 2025. We apply bipartite projection and dimensionality reduction to reveal the geometry of the system and unveil hidden patterns. This exposes distinct functional clusters that are otherwise sparsely connected. We find that while governments and multilateral agencies provide the primary resources, a small set of knowledge brokers provide the critical connectivity. Universities and research foundations specifically act as essential bridges between disparate islands of implementers and funders. We identify a core solar system of 25 central actors who drive this connectivity including unanticipated brokers like J-PAL and the Hewlett Foundation. These findings demonstrate that influence in the aid ecosystem flows through structural connectivity as much as financial volume. Our results provide a new framework for donors to identify strategic partners that accelerate coordination and evidence diffusion across the global network.
arXivππ€
Who Connects Global Aid? The Hidden Geometry of 10 Million Transactions
By McCarthy, Gong, Rizoiu et al
02.02.2026 16:37 β π 0 π 0 π¬ 0 π 0
Motivated by the emerging adoption of Large Language Models (LLMs) in economics and management research, this paper investigates whether LLMs can reliably identify corporate greenwashing narratives and, more importantly, whether and how the greenwashing signals extracted from textual disclosures can be used to empirically identify causal effects. To this end, this paper proposes DeepGreen, a dual-stage LLM-Driven system for detecting potential corporate greenwashing in annual reports. Applied to 9369 A-share annual reports published between 2021 and 2023, DeepGreen attains high reliability in random-sample validation at both stages. Ablation experiment shows that Retrieval-Augmented Generation (RAG) reduces hallucinations, as compared to simply lengthening the input window. Empirical tests indicate that "greenwashing" captured by DeepGreen can effectively reveal a positive relationship between greenwashing and environmental penalties, and IV, PSM, Placebo test, which enhance the robustness and causal effects of the empirical evidence. Further study suggests that the presence and number of green investors can weaken the positive correlation between greenwashing and penalties. Heterogeneity analysis shows that the positive relationship between "greenwashing - penalty" is less significant in large-sized corporations and corporations that have accumulated green assets, indicating that these green assets may be exploited as a credibility shield for greenwashing. Our findings demonstrate that LLMs can standardize ESG oversight by early warning and direct regulators' scarce attention toward the subsets of corporations where monitoring is more warranted.
arXivππ€
DeepGreen: Effective LLM-Driven Greenwashing Monitoring System Designed for Empirical Testing -- Evidence from China
By Xu, Liu, Li et al
02.02.2026 16:35 β π 0 π 0 π¬ 0 π 0
Immigration has shaped many nations, posing the challenge of integrating immigrants into society. While economists often focus on immigrants' economic outcomes compared to natives (such as education, labor market success, and health) social interactions between immigrants and natives are equally crucial. These interactions, from everyday exchanges to teamwork, often lack enforceable contracts and require cooperation to avoid conflicts and achieve efficient outcomes. However, socioeconomic, ethnic, and cultural differences can hinder cooperation. Thus, evaluating integration should also consider its impact on fostering cooperation across diverse groups. This paper studies how priming different identity dimensions affects cooperation between immigrant and native youth. Immigrant identity includes both ethnic ties to their country of origin and connections to the host country. We test whether cooperation improves by making salient a specific identity: Common identity (shared society), Multicultural identity (ethnic group within society), or Neutral identity. In a lab in the field experiment with over 390 adolescents, participants were randomly assigned to one of these priming conditions and played a Public Good Game. Results show that immigrants are 13 percent more cooperative than natives at baseline. Natives increase cooperation by about 3 percentage points when their multicultural identity is primed, closing the initial gap with immigrant peers.
arXivππ€
Identity and Cooperation in Multicultural Societies: An Experimental Investigation
By Montinari, Ploner, Rattini
02.02.2026 16:32 β π 0 π 0 π¬ 0 π 0
Political polarization can be beneficial to competing political parties. I study how electoral competition itself generates incentives to polarize voters, even when parties are ex ante identical and motivated purely by political power, interpreted as office rents or influence. I develop a probabilistic voting model with aggregate popularity shocks in which parties have decreasing marginal utility from political power. Equilibrium policy convergence fails. Platform differentiation provides insurance against electoral volatility by securing loyal voter bases and stabilizing political power. In a unidimensional policy space, parties' equilibrium payoffs rise as voters on opposite sides of the median become more extreme, including when polarization is driven by changes in the opponent's supporters. In a multidimensional setting, parties benefit from ideological coherence, the alignment of disagreements across issues. The results have implications for polarizing political communication, party identity, and electoral institutions.
arXivππ€
Divide and Diverge
By Bonomi
02.02.2026 16:27 β π 0 π 0 π¬ 0 π 0
We propose a novel framework for measuring privacy from a Bayesian game-theoretic perspective. This framework enables the creation of new, purpose-driven privacy definitions that are rigorously justified, while also allowing for the assessment of existing privacy guarantees through game theory. We show that pure and probabilistic differential privacy are special cases of our framework, and provide new interpretations of the post-processing inequality in this setting. Further, we demonstrate that privacy guarantees can be established for deterministic algorithms, which are overlooked by current privacy standards.
arXivππ€
Persuasive Privacy
By Bon, Bailie, Rousseau et al
02.02.2026 16:23 β π 0 π 0 π¬ 0 π 0
Mobilising private capital is a critical bottleneck of the energy transition, yet recent crisis-driven windfall profits for fossil power firms suggest that market signals may still favour carbon-intensive assets. Here we analyse a panel of 900 European power firms (2001-2023) to resolve whether these profits reflect a durable profitability advantage or a crisis-driven anomaly. Using machine-learning clustering and Bayesian model averaging, we identify a structural divergence: wind and solar portfolios exhibit rising profitability, with return on assets among wind-dominated firms increasing by over 6% between 2014 and 2023. Conversely, higher fossil portfolio shares are increasingly associated with lower profitability, with marginal effects reaching -4% by 2023, while renewable-dominated firms match or outperform their fossil-heavy counterparts across most European regions. These findings suggest that the record profits of fossil incumbents were distinct outliers, masking an ongoing decline in the profitability of carbon-intensive business models.
arXivππ€
The Widening Profitability Gap between Renewable and Fossil Power Firms in Europe
By Fischer, Pichler
02.02.2026 16:20 β π 0 π 0 π¬ 0 π 0
A growing share of the existing real estate stock exhibits persistent underperformance that can no longer be explained by cyclical market phases or inadequate maintenance alone. In many cases, technically recoverable assets located in non-marginal contexts fail to generate economic value consistent with the capital immobilized. This condition reflects a structural misalignment between intended use and effective demand rather than episodic market weakness, and calls for a decision framework capable of integrating value, risk, complexity, and irreversibility in strategic use selection. This study proposes a decision-analytic framework for the ex-ante selection of intended use in real estate redevelopment processes. The framework integrates real-options logic on irreversibility and managerial flexibility with a multi-criteria decision-analysis structure, enabling comparative evaluation of expected economic value, market and operational risk, technical and managerial complexity, and time-to-income. By treating redevelopment primarily as a problem of strategic option selection rather than design or financial optimization, the framework operationalizes option value preservation through disciplined ex-ante screening. Illustrative cases demonstrate how this integration of real options reasoning and MCDA reduces over-complexification and misalignment across different asset types and urban contexts.
arXivππ€
A Real-Options-Aware Multi-Criteria Framework for Ex-Ante Real Estate Redevelopment Use Selection
By Garrone
02.02.2026 16:18 β π 0 π 0 π¬ 0 π 0
We investigate a seller's revenue-maximizing mechanism in a setting where a desirable good is sold together with an undesirable bad (e.g., advertisements) that generates third-party revenue. The buyer's private information is two-dimensional: valuation for the good and willingness to pay to avoid the bad. Following the duality framework of Daskalakis, Deckelbaum, and Tzamos (2017), whose results extend to our setting, we formulate the seller's problem using a transformed measure $\mu$ that depends on the third-party payment $k$. We provide a near-characterization for optimality of three pricing mechanisms commonly used in practice -- the Good-Only, Ad-Tiered, and Single-Bundle Posted Price -- and introduce a new class of tractable, interpretable two-dimensional orthant conditions on $\mu$ for sufficiency. Economically, $k$ yields a clean comparative static: low $k$ excludes the bad, intermediate $k$ separates ad-tolerant and ad-averse buyers, and high $k$ bundles ads for all types.
arXivππ€
Screening with Advertisements
By Paramahamsa
02.02.2026 16:17 β π 0 π 0 π¬ 0 π 0
Medical ``Crisis Standards of Care'' call for a utilitarian allocation of scarce resources in emergencies, while favoring the worst-off under normal conditions. Inspired by such triage rules, we introduce social welfare functions whose distributive tradeoffs depend on the prevailing level of aggregate welfare. These functions are inherently self-referential: they take the welfare level as an input, even though that level is itself determined by the function. In our formulation, inequality aversion varies with welfare and is therefore self-referential. We provide an axiomatic foundation for a family of social welfare functions that move from Rawlsian to utilitarian criteria as overall welfare falls, thereby formalizing triage guidelines. We also derive the converse case, in which the social objective shifts from Rawlsianism toward utilitarianism as welfare increases.
arXivππ€
Endogenous Inequality Aversion: Decision criteria for triage and other ethical tradeoffs
By Echenique, Mekonnen, Yenmez
02.02.2026 16:12 β π 0 π 0 π¬ 0 π 0
In this paper, we employ spatial econometric methods to analyze panel data
from German NUTS 3 regions. Our goal is to gain a deeper understanding of the
significance and interdependence of industry clusters in shaping the dynamics
of GDP. To achieve a more nuanced spatial differentiation, we introduce
indicator matrices for each industry sector which allows for extending the
spatial Durbin model to a new version of it. This approach is essential due to
both the economic importance of these sectors and the potential issue of
omitted variables. Failing to account for industry sectors can lead to omitted
variable bias and estimation problems. To assess the effects of the major
industry sectors, we incorporate eight distinct branches of industry into our
analysis. According to prevailing economic theory, these clusters should have a
positive impact on the regions they are associated with. Our findings indeed
reveal highly significant impacts, which can be either positive or negative, of
specific sectors on local GDP growth. Spatially, we observe that direct and
indirect effects can exhibit opposite signs, indicative of heightened
competitiveness within and between industry sectors. Therefore, we recommend
that industry sectors should be taken into consideration when conducting
spatial analysis of GDP. Doing so allows for a more comprehensive understanding
of the economic dynamics at play.
arXivππ€
How industrial clusters influence the growth of the regional GDP: A spatial-approach
By
02.02.2026 03:54 β π 1 π 1 π¬ 0 π 0
From neuroscience and genomics to systems biology and ecology, researchers rely on clustering similarity data to uncover modular structure. Yet widely used clustering methods, such as hierarchical clustering, k-means, and WGCNA, lack principled model selection, leaving them susceptible to noise. A common workaround sparsifies a correlation matrix representation to remove noise before clustering, but this extra step introduces arbitrary thresholds that can distort the structure and lead to unreliable results. To detect reliable clusters, we capitalize on recent advances in network science to unite sparsification and clustering with principled model selection. We test two Bayesian community detection methods, the Degree-Corrected Stochastic Block Model and the Regularized Map Equation, both grounded in the Minimum Description Length principle for model selection. In synthetic data, they outperform traditional approaches, detecting planted clusters under high-noise conditions and with fewer samples. Compared to WGCNA on gene co-expression data, the Regularized Map Equation identifies more robust and functionally coherent gene modules. Our results establish Bayesian community detection as a principled and noise-resistant framework for uncovering modular structure in high-dimensional data across fields.
arXivππ€
Reliable data clustering with Bayesian community detection
By Neuman, Smiljani\'c, Rosvall
02.02.2026 01:37 β π 0 π 0 π¬ 0 π 0
arXiv:2409.19812v1 Announce Type: new
Abstract: We explicitly define the notion of (exact or approximate) compound e-values which have been implicitly presented and extensively used in the recent multiple testing literature. We show that every FDR controlling procedure can be recovered by instantiating the e-BH procedure with certain compound e-values. Since compound e-values are closed under averaging, this allows for combination and derandomization of any FDR procedure. We then point out and exploit the connections to empirical Bayes. In particular, we use the fundamental theorem of compound decision theory to derive the log-optimal simple separable compound e-value for testing a set of point nulls against point alternatives: it is a ratio of mixture likelihoods. We extend universal inference to the compound setting. As one example, we construct approximate compound e-values for multiple t-tests, where the (nuisance) variances may be different across hypotheses. We provide connections to related notions in the literature stated in terms of p-values.
arXivππ€
Compound e-values and empirical Bayes
By
01.02.2026 22:06 β π 0 π 0 π¬ 0 π 0
arXiv:2312.00770v2 Announce Type: replace-cross
Abstract: Recurrent events are common in clinical, healthcare, social and behavioral studies. A recent analysis framework for potentially censored recurrent event data is to construct a censored longitudinal data set consisting of times to the first recurrent event in multiple prespecified follow-up windows of length $\tau$. With the staggering number of potential predictors being generated from genetic, -omic, and electronic health records sources, machine learning approaches such as the random forest are growing in popularity, as they can incorporate information from highly correlated predictors with non-standard relationships. In this paper, we bridge this gap by developing a random forest approach for dynamically predicting probabilities of remaining event-free during a subsequent $\tau$-duration follow-up period from a reconstructed censored longitudinal data set. We demonstrate the increased ability of our random forest algorithm for predicting the probability of remaining event-free over a $\tau$-duration follow-up period when compared to the recurrent event modeling framework of Xia et al. (2020) in settings where association between predictors and recurrent event outcomes is complex in nature. The proposed random forest algorithm is demonstrated using recurrent exacerbation data from the Azithromycin for the Prevention of Exacerbations of Chronic Obstructive Pulmonary Disease (Albert et al., 2011).
arXivππ€
Random Forest for Dynamic Risk Prediction or Recurrent Events: A Pseudo-Observation Approach
By
01.02.2026 19:10 β π 0 π 0 π¬ 0 π 0
Tensor regression has attracted significant attention in statistical research. This study tackles the challenge of handling covariates with smooth varying structures. We introduce a novel framework, termed functional tensor regression, which incorporates both the tensor and functional aspects of the covariate. To address the high dimensionality and functional continuity of the regression coefficient, we employ a low Tucker rank decomposition along with smooth regularization for the functional mode. We develop a functional Riemannian Gauss--Newton algorithm that demonstrates a provable quadratic convergence rate, while the estimation error bound is based on the tensor covariate dimension. Simulations and a neuroimaging analysis illustrate the finite sample performance of the proposed method.
arXivππ€
Functional Tensor Regression
By Li, Yao, Zhang
01.02.2026 16:08 β π 0 π 0 π¬ 0 π 0
The plausibility of the ``parallel trends assumption'' in
Difference-in-Differences estimation is usually assessed by a test of the null
hypothesis that the difference between the average outcomes of both groups is
constant over time before the treatment. However, failure to reject the null
hypothesis does not imply the absence of differences in time trends between
both groups. We provide equivalence tests that allow researchers to find
evidence in favor of the parallel trends assumption and thus increase the
credibility of their treatment effect estimates. While we motivate our tests in
the standard two-way fixed effects model, we discuss simple extensions to
settings in which treatment adoption is staggered over time.
arXivππ€
Testing for equivalence of pre-trends in Difference-in-Differences estimation
By
01.02.2026 03:59 β π 0 π 0 π¬ 0 π 0
Randomized experiments are the gold standard for estimating the average treatment effect (ATE). While covariate adjustment can reduce the asymptotic variances of the unbiased Horvitz-Thompson estimators for the ATE, it suffers from finite-sample biases due to data reuse in both prediction and estimation. Traditional sample-splitting and cross-fitting methods can address the problem of data reuse and obtain unbiased estimators. However, they require that the data are independently and identically distributed, which is usually violated under the design-based inference framework for randomized experiments. To address this challenge, we propose a novel conditional cross-fitting method, under the design-based inference framework, where potential outcomes and covariates are fixed and the randomization is the sole source of randomness. We propose sample-splitting algorithms for various randomized experiments, including Bernoulli randomized experiments, completely randomized experiments, and stratified randomized experiments. Based on the proposed algorithms, we construct unbiased covariate-adjusted ATE estimators and propose valid inference procedures. Our methods can accommodate flexible machine-learning-assisted covariate adjustments and allow for model misspecification.
arXivππ€
Conditional cross-fitting for unbiased machine-learning-assisted covariate adjustment in randomized experiments
By Lu, Shi, Liu et al
01.02.2026 01:36 β π 0 π 0 π¬ 0 π 0
arXiv:2109.02726v2 Announce Type: replace-cross
Abstract: Screening traditionally refers to the problem of detecting active inputs in the computer model. In this paper, we develop methodology that applies to screening, but the main focus is on detecting active inputs not in the computer model itself but rather on the discrepancy function that is introduced to account for model inadequacy when linking the computer model with field observations. We contend this is an important problem as it informs the modeler which are the inputs that are potentially being mishandled in the model, but also along which directions it may be less recommendable to use the model for prediction. The methodology is Bayesian and is inspired by the continuous spike and slab prior popularized by the literature on Bayesian variable selection. In our approach, and in contrast with previous proposals, a single MCMC sample from the full model allows us to compute the posterior probabilities of all the competing models, resulting in a methodology that is computationally very fast. The approach hinges on the ability to obtain posterior inclusion probabilities of the inputs, which are very intuitive and easy to interpret quantities, as the basis for selecting active inputs. For that reason, we name the methodology PIPS -- posterior inclusion probability screening.
arXivππ€
Screening the Discrepancy Function of a Computer Model
By
31.01.2026 22:06 β π 0 π 0 π¬ 0 π 0
We study the robustness of conformal prediction, a powerful tool for uncertainty quantification, to label noise. Our analysis tackles both regression and classification problems, characterizing when and how it is possible to construct uncertainty sets that correctly cover the unobserved noiseless ground truth labels. We further extend our theory and formulate the requirements for correctly controlling a general loss function, such as the false negative proportion, with noisy labels. Our theory and experiments suggest that conformal prediction and risk-controlling techniques with noisy labels attain conservative risk over the clean ground truth labels whenever the noise is dispersive and increases variability. In other adversarial cases, we can also correct for noise of bounded size in the conformal prediction algorithm in order to ensure achieving the correct risk of the ground truth labels without score or data regularity.
arXivππ€
Label Noise Robustness of Conformal Prediction
By Einbinder, Feldman, Bates et al
31.01.2026 19:07 β π 0 π 0 π¬ 0 π 0
The importance of Synthetic Data Generation (SDG) has increased significantly in domains where data quality is poor or access is limited due to privacy and regulatory constraints. One such domain is recruitment, where publicly available datasets are scarce due to the sensitive nature of information typically found in curricula vitae, such as gender, disability status, or age. %
This lack of accessible, representative data presents a significant obstacle to the development of fair and transparent machine learning models, particularly ranking algorithms that require large volumes of data to effectively learn how to recommend candidates. In the absence of such data, these models are prone to poor generalisation and may fail to perform reliably in real-world scenarios. %
Recent advances in Causal Generative Models (CGMs) offer a promising solution. CGMs enable the generation of synthetic datasets that preserve the underlying causal relationships within the data, providing greater control over fairness and interpretability in the data generation process. %
In this study, we present a specialised SDG method involving two CGMs: one modelling job offers and the other modelling curricula. Each model is structured according to a causal graph informed by domain expertise. We use these models to generate synthetic datasets and evaluate the fairness of candidate rankings under controlled scenarios that introduce specific biases.
arXivππ€
Causal Synthetic Data Generation in Recruitment
By Iommi, Mastropietro, Guidotti et al
31.01.2026 16:07 β π 1 π 0 π¬ 0 π 0
Mediation analysis is a statistical approach that can provide insights regarding the intermediary processes by which an intervention or exposure affects a given outcome. Mediation analyses rose to prominence, particularly in social science research, with the publication of the seminal paper by Baron and Kenny and is now commonly applied in many research disciplines, including health services research. Despite the growth in popularity, applied researchers may still encounter challenges in terms of conducting mediation analyses in practice. In this paper, we provide an overview of conceptual and methodological challenges that researchers face when conducting mediation analyses. Specifically, we discuss the following key challenges: (1) Conceptually differentiating mediators from other third variables, (2) Extending beyond the single mediator context, (3) Identifying appropriate datasets in which measurement and temporal ordering supports the hypothesized mediation model, (4) Selecting mediation effects that reflect the scientific question of interest, (5) Assessing the validity of underlying assumptions of no omitted confounders, (6) Addressing measurement error regarding the mediator, and (7) Clearly reporting results from mediation analyses. We discuss each challenge and highlight ways in which the applied researcher can approach these challenges.
arXivππ€
Practical challenges in mediation analysis: A guide for applied researchers
By
31.01.2026 01:36 β π 0 π 0 π¬ 0 π 0
Meta-analysis is the aggregation of data from multiple studies to find
patterns across a broad range relating to a particular subject. It is becoming
increasingly useful to apply meta-analysis to summarize these studies being
done across various fields. In meta-analysis, it is common to use the mean and
standard deviation from each study to compare for analysis. While many studies
reported mean and standard deviation for their summary statistics, some report
other values including the minimum, maximum, median, and first and third
quantiles. Often, the quantiles and median are reported when the data is skewed
and does not follow a normal distribution. In order to correctly summarize the
data and draw conclusions from multiple studies, it is necessary to estimate
the mean and standard deviation from each study, considering variation and
skewness within each study. In past literature, methods have been proposed to
estimate the mean and standard deviation, but do not consider negative values.
Data that include negative values are common and would increase the accuracy
and impact of the me-ta-analysis. We propose a method that implements a
generalized Box-Cox transformation to estimate the mean and standard deviation
accounting for such negative values while maintaining similar accuracy.
arXivππ€
Generalized Box-Cox method to estimate sample mean and standard deviation for Meta-analysis
By
30.01.2026 22:07 β π 0 π 0 π¬ 0 π 0
We propose causal effect estimators based on empirical Fr\'{e}chet means and operator-valued kernels, tailored to functional data spaces. These methods address the challenges of high-dimensionality, sequential ordering, and model complexity while preserving robustness to treatment misspecification. Using structural assumptions, we obtain compact representations of potential outcomes, enabling scalable estimation of causal effects over time and across covariates. We provide both theoretical, regarding the consistency of functional causal effects, as well as empirical comparison of a range of proposed causal effect estimators.
Applications to binary treatment settings with functional outcomes illustrate the framework's utility in biomedical monitoring, where outcomes exhibit complex temporal dynamics. Our estimators accommodate scenarios with registered covariates and outcomes, aligning them to the Fr\'{e}chet means, as well as cases requiring higher-order representations to capture intricate covariate-outcome interactions. These advancements extend causal inference to dynamic and non-linear domains, offering new tools for understanding complex treatment effects in functional data settings.
arXivππ€
Kernel-based estimators for functional causal effects
By Raykov, Luo, Strait et al
30.01.2026 19:15 β π 0 π 0 π¬ 0 π 0