ArXiv Paperboy (Stat.ME+Econ.EM)'s Avatar

ArXiv Paperboy (Stat.ME+Econ.EM)

@paperposterbot.bsky.social

posts updates from arXiv rss feeds for methodology papers in Statistics and Econometrics. Also maintains an arxiv and posts random papers from it. maintainer: @apoorvalal.com source code: https://github.com/apoorvalal/bsky_paperbot

2,149 Followers  |  2 Following  |  20,619 Posts  |  Joined: 30.09.2023
Posts Following

Posts by ArXiv Paperboy (Stat.ME+Econ.EM) (@paperposterbot.bsky.social)

We study the deployment performance of machine learning based enforcement systems used in cryptocurrency anti money laundering (AML). Using forward looking and rolling evaluations on Bitcoin transaction data, we show that strong static classification metrics substantially overstate real world regulatory effectiveness. Temporal nonstationarity induces pronounced instability in cost sensitive enforcement thresholds, generating large and persistent excess regulatory losses relative to dynamically optimal benchmarks. The core failure arises from miscalibration of decision rules rather than from declining predictive accuracy per se. These findings underscore the fragility of fixed AML enforcement policies in evolving digital asset markets and motivate loss-based evaluation frameworks for regulatory oversight.

We study the deployment performance of machine learning based enforcement systems used in cryptocurrency anti money laundering (AML). Using forward looking and rolling evaluations on Bitcoin transaction data, we show that strong static classification metrics substantially overstate real world regulatory effectiveness. Temporal nonstationarity induces pronounced instability in cost sensitive enforcement thresholds, generating large and persistent excess regulatory losses relative to dynamically optimal benchmarks. The core failure arises from miscalibration of decision rules rather than from declining predictive accuracy per se. These findings underscore the fragility of fixed AML enforcement policies in evolving digital asset markets and motivate loss-based evaluation frameworks for regulatory oversight.

arXivπŸ“ˆπŸ€–
Algorithmic Compliance and Regulatory Loss in Digital Assets
By Bhatt, Sharma

05.03.2026 17:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Large language models (LLMs) are trained on enormous amounts of data and encode knowledge in their parameters. We propose a pipeline to elicit causal relationships from LLMs. Specifically, (i) we sample many documents from LLMs on a given topic, (ii) we extract an event list from from each document, (iii) we group events that appear across documents into canonical events, (iv) we construct a binary indicator vector for each document over canonical events, and (v) we estimate candidate causal graphs using causal discovery methods. Our approach does not guarantee real-world causality. Rather, it provides a framework for presenting the set of causal hypotheses that LLMs can plausibly assume, as an inspectable set of variables and candidate graphs.

Large language models (LLMs) are trained on enormous amounts of data and encode knowledge in their parameters. We propose a pipeline to elicit causal relationships from LLMs. Specifically, (i) we sample many documents from LLMs on a given topic, (ii) we extract an event list from from each document, (iii) we group events that appear across documents into canonical events, (iv) we construct a binary indicator vector for each document over canonical events, and (v) we estimate candidate causal graphs using causal discovery methods. Our approach does not guarantee real-world causality. Rather, it provides a framework for presenting the set of causal hypotheses that LLMs can plausibly assume, as an inspectable set of variables and candidate graphs.

arXivπŸ“ˆπŸ€–
Causality Elicitation from Large Language Models
By Kameyama, Kato, Hio et al

05.03.2026 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Experimentation is central to modern digital businesses, but many operational decisions cannot be randomized at the user level. In such cases, cluster-level experiments, where clusters are usually geographic, come to the rescue. However, such experiments often suffer from low power due to persistent cluster heterogeneity, strong seasonality, and autocorrelated outcome metrics, as well as common shocks that move many clusters simultaneously. On an example of airline pricing - where policies are typically applied at the route level and thus the A/B test unit of analysis is a route - we study switchback designs to remedy these problems. In switchback designs, each cluster (route in our case) alternates between treatment and control on a fixed schedule, creating within-route contrasts that mitigate time-invariant heterogeneity and reduce sensitivity to low-frequency noise. We provide a unified Two-Way Fixed Effects interpretation of switchback experiments that makes the identifying variation explicit after partialling out route and time effects, clarifying how switching cadence interacts with temporal dependence to determine precision. Empirically, we evaluate weekly and daily switchback cadences using calibrated synthetic regimes and operational airline data from ancillary pricing. In our evaluations, switchbacks decrease standard errors by up to 67%, with daily switching yielding the largest gains over short horizons and weekly switching offering a strong and simpler-to-operationalize alternative.

Experimentation is central to modern digital businesses, but many operational decisions cannot be randomized at the user level. In such cases, cluster-level experiments, where clusters are usually geographic, come to the rescue. However, such experiments often suffer from low power due to persistent cluster heterogeneity, strong seasonality, and autocorrelated outcome metrics, as well as common shocks that move many clusters simultaneously. On an example of airline pricing - where policies are typically applied at the route level and thus the A/B test unit of analysis is a route - we study switchback designs to remedy these problems. In switchback designs, each cluster (route in our case) alternates between treatment and control on a fixed schedule, creating within-route contrasts that mitigate time-invariant heterogeneity and reduce sensitivity to low-frequency noise. We provide a unified Two-Way Fixed Effects interpretation of switchback experiments that makes the identifying variation explicit after partialling out route and time effects, clarifying how switching cadence interacts with temporal dependence to determine precision. Empirically, we evaluate weekly and daily switchback cadences using calibrated synthetic regimes and operational airline data from ancillary pricing. In our evaluations, switchbacks decrease standard errors by up to 67%, with daily switching yielding the largest gains over short horizons and weekly switching offering a strong and simpler-to-operationalize alternative.

arXivπŸ“ˆπŸ€–
Cluster-Level Experiments using Temporal Switchback Designs: Precision Gains in Pricing A/B Tests at LATAM Airlines
By Ferrari-Ortiz, Orellana-Montini, Abbiasov et al

05.03.2026 17:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Density aggregation is a central problem in machine learning, for instance when combining predictions from a Deep Ensemble. The choice of aggregation remains an open question with two commonly proposed approaches being linear pooling (probability averaging) and geometric pooling (logit averaging). In this work, we address this question by studying the normalized generalized mean of order $r \in \mathbb{R} \cup \{-\infty,+\infty\}$ through the lens of log-likelihood, the standard evaluation criterion in machine learning. This provides a unifying aggregation formalism and shows different optimal configurations for different situations. We show that the regime $r \in [0,1]$ is the only range ensuring systematic improvements relative to individual distributions, thereby providing a principled justification for the reliability and widespread practical use of linear ($r=1$) and geometric ($r=0$) pooling. In contrast, we show that aggregation rules with $r \notin [0,1]$ may fail to provide consistent gains with explicit counterexamples. Finally, we corroborate our theoretical findings with empirical evaluations using Deep Ensembles on image and text classification benchmarks.

Density aggregation is a central problem in machine learning, for instance when combining predictions from a Deep Ensemble. The choice of aggregation remains an open question with two commonly proposed approaches being linear pooling (probability averaging) and geometric pooling (logit averaging). In this work, we address this question by studying the normalized generalized mean of order $r \in \mathbb{R} \cup \{-\infty,+\infty\}$ through the lens of log-likelihood, the standard evaluation criterion in machine learning. This provides a unifying aggregation formalism and shows different optimal configurations for different situations. We show that the regime $r \in [0,1]$ is the only range ensuring systematic improvements relative to individual distributions, thereby providing a principled justification for the reliability and widespread practical use of linear ($r=1$) and geometric ($r=0$) pooling. In contrast, we show that aggregation rules with $r \notin [0,1]$ may fail to provide consistent gains with explicit counterexamples. Finally, we corroborate our theoretical findings with empirical evaluations using Deep Ensembles on image and text classification benchmarks.

arXivπŸ“ˆπŸ€–
Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means
By Razafindralambo, Sun, Precioso et al

05.03.2026 17:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Theoretical and applied research into privacy encompasses an incredibly broad swathe of differing approaches, emphasis and aims. This work introduces a new quantitative notion of privacy that is both contextual and specific. We argue that it provides a more meaningful notion of privacy than the widely utilised framework of differential privacy and a more explicit and rigorous formulation than what is commonly used in statistical disclosure theory. Our definition relies on concepts inherent to standard Bayesian decision theory, while departing from it in several important respects. In particular, the party controlling the release of sensitive information should make disclosure decisions from the prior viewpoint, rather than conditional on the data, even when the data is itself observed. Illuminating toy examples and computational methods are discussed in high detail in order to highlight the specificities of the method.

Theoretical and applied research into privacy encompasses an incredibly broad swathe of differing approaches, emphasis and aims. This work introduces a new quantitative notion of privacy that is both contextual and specific. We argue that it provides a more meaningful notion of privacy than the widely utilised framework of differential privacy and a more explicit and rigorous formulation than what is commonly used in statistical disclosure theory. Our definition relies on concepts inherent to standard Bayesian decision theory, while departing from it in several important respects. In particular, the party controlling the release of sensitive information should make disclosure decisions from the prior viewpoint, rather than conditional on the data, even when the data is itself observed. Illuminating toy examples and computational methods are discussed in high detail in order to highlight the specificities of the method.

arXivπŸ“ˆπŸ€–
Bayesian Adversarial Privacy
By Bell, Johnston, Luciano et al

05.03.2026 17:28 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
The Bayesian and Akaike information criteria aim at finding a good balance between under- and over-fitting. They are extensively used every day by practitioners. Yet we contend they suffer from at least two afflictions: their penalty parameter $\lambda=\log n$ and $\lambda=2$ are too small, leading to many false discoveries, and their inherent (best subset) discrete optimization is infeasible in high dimension. We alleviate these issues with the pivotal information criterion: PIC is defined as a continuous optimization problem, and the PIC penalty parameter $\lambda$ is selected at the detection boundary (under pure noise). PIC's choice of $\lambda$ is the quantile of a statistic that we prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed. As a result, simulations show a phase transition in the probability of exact support recovery with PIC, a phenomenon studied with no noise in compressed sensing. Applied on real data, for similar predictive performances, PIC selects the least complex model among state-of-the-art learners.

The Bayesian and Akaike information criteria aim at finding a good balance between under- and over-fitting. They are extensively used every day by practitioners. Yet we contend they suffer from at least two afflictions: their penalty parameter $\lambda=\log n$ and $\lambda=2$ are too small, leading to many false discoveries, and their inherent (best subset) discrete optimization is infeasible in high dimension. We alleviate these issues with the pivotal information criterion: PIC is defined as a continuous optimization problem, and the PIC penalty parameter $\lambda$ is selected at the detection boundary (under pure noise). PIC's choice of $\lambda$ is the quantile of a statistic that we prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed. As a result, simulations show a phase transition in the probability of exact support recovery with PIC, a phenomenon studied with no noise in compressed sensing. Applied on real data, for similar predictive performances, PIC selects the least complex model among state-of-the-art learners.

arXivπŸ“ˆπŸ€–
The Pivotal Information Criterion
By Sardy, Cutsem, Geer

05.03.2026 17:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
The growing use of unstructured text in business research makes topic modeling a central tool for constructing explanatory variables from reviews, social media, and open-ended survey responses, yet existing approaches function poorly as measurement instruments. Prior work shows that textual content predicts outcomes such as sales, satisfaction, and firm performance, but probabilistic models often generate conceptually diffuse topics, neural topic models are difficult to interpret in theory-driven settings, and large language model approaches lack standardization, stability, and alignment with document-level representations. We introduce LX Topic, a neural topic method that conceptualizes topics as latent linguistic constructs and produces calibrated document-level topic proportions for empirical analysis. LX Topic builds on FASTopic to ensure strong document representativeness and integrates large language model refinement at the topic-word level using alignment and confidence-weighting mechanisms that enhance semantic coherence without distorting document-topic distributions. Evaluations on large-scale Amazon and Yelp review datasets demonstrate that LX Topic achieves the highest overall topic quality relative to leading models while preserving clustering and classification performance. By unifying topic discovery, refinement, and standardized output in a web-based system, LX Topic establishes topic modeling as a reproducible, interpretable, and measurement-oriented instrument for marketing research and practice.

The growing use of unstructured text in business research makes topic modeling a central tool for constructing explanatory variables from reviews, social media, and open-ended survey responses, yet existing approaches function poorly as measurement instruments. Prior work shows that textual content predicts outcomes such as sales, satisfaction, and firm performance, but probabilistic models often generate conceptually diffuse topics, neural topic models are difficult to interpret in theory-driven settings, and large language model approaches lack standardization, stability, and alignment with document-level representations. We introduce LX Topic, a neural topic method that conceptualizes topics as latent linguistic constructs and produces calibrated document-level topic proportions for empirical analysis. LX Topic builds on FASTopic to ensure strong document representativeness and integrates large language model refinement at the topic-word level using alignment and confidence-weighting mechanisms that enhance semantic coherence without distorting document-topic distributions. Evaluations on large-scale Amazon and Yelp review datasets demonstrate that LX Topic achieves the highest overall topic quality relative to leading models while preserving clustering and classification performance. By unifying topic discovery, refinement, and standardized output in a web-based system, LX Topic establishes topic modeling as a reproducible, interpretable, and measurement-oriented instrument for marketing research and practice.

arXivπŸ“ˆπŸ€–
A Neural Topic Method Using a Large-Language-Model-in-the-Loop for Business Research
By Ludwig, Danaher, Yang

05.03.2026 17:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Western governments have adopted an assortment of counter-hybrid threat measures to defend against hostile actions below the conventional military threshold. The impact of these measures is unclear because of the ambiguity of hybrid threats, their cross-domain nature, and uncertainty about how countermeasures shape adversarial behavior. This paper offers a novel approach to clarifying this impact by unifying previously bifurcating hybrid threat modeling methods through a (multi-agent) influence diagram framework. The model balances the costs of countermeasures, their ability to dissuade the adversary from executing hybrid threats, and their potential to mitigate the impact of hybrid threats. We run 1000 semi-synthetic variants of a real-world-inspired scenario simulating the strategic interaction between attacking agent A and defending agent B over a cyber attack on critical infrastructure to explore the effectiveness of a set of five different counter-hybrid threat measures. Counter-hybrid measures range from strengthening resilience and denial of the adversary's ability to execute a hybrid threat to dissuasion through the threat of punishment. Our analysis primarily evaluates the overarching characteristics of counter-hybrid threat measures. This approach allows us to generalize the effectiveness of these measures and examine parameter impact sensitivity. In addition, we discuss policy relevance and outline future research avenues.

Western governments have adopted an assortment of counter-hybrid threat measures to defend against hostile actions below the conventional military threshold. The impact of these measures is unclear because of the ambiguity of hybrid threats, their cross-domain nature, and uncertainty about how countermeasures shape adversarial behavior. This paper offers a novel approach to clarifying this impact by unifying previously bifurcating hybrid threat modeling methods through a (multi-agent) influence diagram framework. The model balances the costs of countermeasures, their ability to dissuade the adversary from executing hybrid threats, and their potential to mitigate the impact of hybrid threats. We run 1000 semi-synthetic variants of a real-world-inspired scenario simulating the strategic interaction between attacking agent A and defending agent B over a cyber attack on critical infrastructure to explore the effectiveness of a set of five different counter-hybrid threat measures. Counter-hybrid measures range from strengthening resilience and denial of the adversary's ability to execute a hybrid threat to dissuasion through the threat of punishment. Our analysis primarily evaluates the overarching characteristics of counter-hybrid threat measures. This approach allows us to generalize the effectiveness of these measures and examine parameter impact sensitivity. In addition, we discuss policy relevance and outline future research avenues.

arXivπŸ“ˆπŸ€–
Multi-Agent Influence Diagrams to Hybrid Threat Modeling
By Vonk, Kononova, B\"ack et al

05.03.2026 17:21 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify iteration increments in KGD, deriving an adaptive parameter selection strategy that is implementable. Theoretical verifications are provided within the framework of learning theory. Utilizing the recently developed integral operator approach, we rigorously demonstrate that KGD, equipped with the proposed adaptive parameter selection strategy, achieves the optimal generalization error bound and adapts effectively to different kernels, target functions, and error metrics. Consequently, this strategy showcases significant advantages over existing parameter selection methods for KGD.

This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify iteration increments in KGD, deriving an adaptive parameter selection strategy that is implementable. Theoretical verifications are provided within the framework of learning theory. Utilizing the recently developed integral operator approach, we rigorously demonstrate that KGD, equipped with the proposed adaptive parameter selection strategy, achieves the optimal generalization error bound and adapts effectively to different kernels, target functions, and error metrics. Consequently, this strategy showcases significant advantages over existing parameter selection methods for KGD.

arXivπŸ“ˆπŸ€–
Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents
By Liu, Lei, Chang et al

05.03.2026 17:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
To characterize the community structure in network data, researchers have developed various block-type models, including the stochastic block model, the degree-corrected stochastic block model, the mixed membership block model, the degree-corrected mixed membership block model, and others. A critical step in applying these models effectively is determining the number of communities in the network. However, to the best of our knowledge, existing methods for estimating the number of network communities either rely on explicit model fitting or fail to simultaneously accommodate network sparsity and a diverging number of communities. In this paper, we propose a model-free spectral inference method based on eigengap ratios that addresses these challenges. The inference procedure is straightforward to compute, requires no parameter tuning, and can be applied to a wide range of block models without the need to estimate network distribution parameters. Furthermore, it is effective for both dense and sparse networks with a divergent number of communities. Technically, we show that the proposed spectral test statistic converges to a {function of the type-I Tracy-Widom distribution via the Airy kernel} under the null hypothesis, and that the test is asymptotically powerful under weak alternatives. Simulation studies on both dense and sparse networks demonstrate the efficacy of the proposed method. Three real-world examples are presented to illustrate the usefulness of the proposed test.

To characterize the community structure in network data, researchers have developed various block-type models, including the stochastic block model, the degree-corrected stochastic block model, the mixed membership block model, the degree-corrected mixed membership block model, and others. A critical step in applying these models effectively is determining the number of communities in the network. However, to the best of our knowledge, existing methods for estimating the number of network communities either rely on explicit model fitting or fail to simultaneously accommodate network sparsity and a diverging number of communities. In this paper, we propose a model-free spectral inference method based on eigengap ratios that addresses these challenges. The inference procedure is straightforward to compute, requires no parameter tuning, and can be applied to a wide range of block models without the need to estimate network distribution parameters. Furthermore, it is effective for both dense and sparse networks with a divergent number of communities. Technically, we show that the proposed spectral test statistic converges to a {function of the type-I Tracy-Widom distribution via the Airy kernel} under the null hypothesis, and that the test is asymptotically powerful under weak alternatives. Simulation studies on both dense and sparse networks demonstrate the efficacy of the proposed method. Three real-world examples are presented to illustrate the usefulness of the proposed test.

arXivπŸ“ˆπŸ€–
A spectral inference method for determining the number of communities in networks
By Wu, Ding, Zhang et al

05.03.2026 17:13 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
The progression of chronic diseases often follows highly variable trajectories, and the underlying factors remain poorly understood. Standard mixed-effects models typically represent inter-patient differences as random deviations around a common reference, which may obscure meaningful subgroups. We propose a probabilistic mixture extension of a mixed effects model, the Disease Course Mapping model, to identify distinct disease progression subtypes within a population. The mixture structure is introduced at the latent individual parameters, enabling clustering based on both temporal and spatial variability in disease trajectories. We evaluated the model through simulation studies to assess classification performance and parameter recovery. Classification accuracy exceeded 90% in simpler scenarios and remained above 80% in the most complex case, with particularly high recall and precision for fast-progressing clusters. Compared to a post hoc classification approach, the proposed model yielded more accurate parameter estimates, smaller biases, lower root mean squared errors, and reduced uncertainty. It also correctly recovered the true three-cluster structure in 93% of the simulations. Finally, we applied the model to a longitudinal cohort of CADASIL patients, identifying two clinically meaningful clusters, differentiating patients with early versus late onset and fast versus slow progression, with clear spatial patterns across motor and memory scores. Overall, this probabilistic mixture framework offers a robust, interpretable approach for clustering patients based on spatiotemporal disease dynamics.

The progression of chronic diseases often follows highly variable trajectories, and the underlying factors remain poorly understood. Standard mixed-effects models typically represent inter-patient differences as random deviations around a common reference, which may obscure meaningful subgroups. We propose a probabilistic mixture extension of a mixed effects model, the Disease Course Mapping model, to identify distinct disease progression subtypes within a population. The mixture structure is introduced at the latent individual parameters, enabling clustering based on both temporal and spatial variability in disease trajectories. We evaluated the model through simulation studies to assess classification performance and parameter recovery. Classification accuracy exceeded 90% in simpler scenarios and remained above 80% in the most complex case, with particularly high recall and precision for fast-progressing clusters. Compared to a post hoc classification approach, the proposed model yielded more accurate parameter estimates, smaller biases, lower root mean squared errors, and reduced uncertainty. It also correctly recovered the true three-cluster structure in 93% of the simulations. Finally, we applied the model to a longitudinal cohort of CADASIL patients, identifying two clinically meaningful clusters, differentiating patients with early versus late onset and fast versus slow progression, with clear spatial patterns across motor and memory scores. Overall, this probabilistic mixture framework offers a robust, interpretable approach for clustering patients based on spatiotemporal disease dynamics.

arXivπŸ“ˆπŸ€–
A mixture model for subtype identification in the context of disease progression modeling
By Kaisaridi, Ortholand, Tuna et al

05.03.2026 17:09 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
This paper develops a comprehensive Markov-based framework for modelling reservoir behaviour and assessing key performance measures such as reliability and resilience. We first formulate a stochastic model for a finite-capacity dam, analysing its long-term storage dynamics under both independent and identically distributed inflows, following the Moran model, and correlated inflows represented by an ergodic Markov chain in the Lloyd formulation. For this finite case, we establish stationary water balance relations and derive asymptotic results, including a central limit theorem for storage levels. The analysis is then extended to an infinite-capacity reservoir, for which normal limit distributions and analogous long-term properties are obtained. A continuous-state formulation is also introduced to represent reservoirs with continuous inflow processes, generalizing the discrete-state framework. On this basis, we define and evaluate reliability and resilience metrics within the proposed Markovian context. The applicability of the methodology is demonstrated through a real-world case study of the Quiebrajano dam, illustrating how the developed models can support efficient and sustainable reservoir management under hydrological uncertainty.

This paper develops a comprehensive Markov-based framework for modelling reservoir behaviour and assessing key performance measures such as reliability and resilience. We first formulate a stochastic model for a finite-capacity dam, analysing its long-term storage dynamics under both independent and identically distributed inflows, following the Moran model, and correlated inflows represented by an ergodic Markov chain in the Lloyd formulation. For this finite case, we establish stationary water balance relations and derive asymptotic results, including a central limit theorem for storage levels. The analysis is then extended to an infinite-capacity reservoir, for which normal limit distributions and analogous long-term properties are obtained. A continuous-state formulation is also introduced to represent reservoirs with continuous inflow processes, generalizing the discrete-state framework. On this basis, we define and evaluate reliability and resilience metrics within the proposed Markovian context. The applicability of the methodology is demonstrated through a real-world case study of the Quiebrajano dam, illustrating how the developed models can support efficient and sustainable reservoir management under hydrological uncertainty.

arXivπŸ“ˆπŸ€–
Markov-Based Modelling for Reservoir Management: Assessing Reliability and Resilience
By G\'amiz, Limnios, Montoro-Cazorla et al

05.03.2026 17:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
We introduce inference methods for score decompositions, which partition scoring functions for predictive assessment into three interpretable components: miscalibration, discrimination, and uncertainty. Our estimation and inference relies on a linear recalibration of the forecasts, which is applicable to general multi-step ahead point forecasts such as means and quantiles due to its validity for both smooth and non-smooth scoring functions. This approach ensures desirable finite-sample properties, enables asymptotic inference, and establishes a direct connection to the classical Mincer-Zarnowitz regression. The resulting inference framework facilitates tests for equal forecast calibration or discrimination, which yield three key advantages. They enhance the information content of predictive ability tests by decomposing scores, deliver higher statistical power in certain scenarios, and formally connect scoring-function-based evaluation to traditional calibration tests, such as financial backtests. Applications demonstrate the method's utility. We find that for survey inflation forecasts, discrimination abilities can differ significantly even when overall predictive ability does not. In an application to financial risk models, our tests provide deeper insights into the calibration and information content of volatility and Value-at-Risk forecasts. By disentangling forecast accuracy from backtest performance, the method exposes critical shortcomings in current banking regulation.

We introduce inference methods for score decompositions, which partition scoring functions for predictive assessment into three interpretable components: miscalibration, discrimination, and uncertainty. Our estimation and inference relies on a linear recalibration of the forecasts, which is applicable to general multi-step ahead point forecasts such as means and quantiles due to its validity for both smooth and non-smooth scoring functions. This approach ensures desirable finite-sample properties, enables asymptotic inference, and establishes a direct connection to the classical Mincer-Zarnowitz regression. The resulting inference framework facilitates tests for equal forecast calibration or discrimination, which yield three key advantages. They enhance the information content of predictive ability tests by decomposing scores, deliver higher statistical power in certain scenarios, and formally connect scoring-function-based evaluation to traditional calibration tests, such as financial backtests. Applications demonstrate the method's utility. We find that for survey inflation forecasts, discrimination abilities can differ significantly even when overall predictive ability does not. In an application to financial risk models, our tests provide deeper insights into the calibration and information content of volatility and Value-at-Risk forecasts. By disentangling forecast accuracy from backtest performance, the method exposes critical shortcomings in current banking regulation.

arXivπŸ“ˆπŸ€–
Statistical Inference for Score Decompositions
By Dimitriadis, Puke

05.03.2026 17:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Producing reliable estimates of health and demographic indicators at fine areal scales is crucial for examining heterogeneity and supporting localized health policy. However, many surveys release outcomes only at coarser administrative levels, thereby limiting their relevance for decision-making. We propose a fully Bayesian, single-stage spatial modeling framework for area-level disaggregation that generates fine-scale estimates of indicators directly from coarsely aggregated survey data. By defining a latent spatial process at the target resolution and linking it to observed outcomes through an aggregation step, the framework adopts small-area estimation techniques while incorporating covariates and delivering coherent uncertainty quantification. The proposed methods are implemented with inlabru to achieve computational efficiency. We evaluate performance through a simulation study of general fertility rates in Kenya to demonstrate the models' ability to recover fine-scale variation across diverse data-generating scenarios. We further apply the framework to two national surveys to produce district-level fertility estimates from the 2022 Kenya Demographic and Health Survey and, more importantly, district-level indicators for unpaid care and domestic work and mass media usage from the 2021 Kenya Time Use Survey.

Producing reliable estimates of health and demographic indicators at fine areal scales is crucial for examining heterogeneity and supporting localized health policy. However, many surveys release outcomes only at coarser administrative levels, thereby limiting their relevance for decision-making. We propose a fully Bayesian, single-stage spatial modeling framework for area-level disaggregation that generates fine-scale estimates of indicators directly from coarsely aggregated survey data. By defining a latent spatial process at the target resolution and linking it to observed outcomes through an aggregation step, the framework adopts small-area estimation techniques while incorporating covariates and delivering coherent uncertainty quantification. The proposed methods are implemented with inlabru to achieve computational efficiency. We evaluate performance through a simulation study of general fertility rates in Kenya to demonstrate the models' ability to recover fine-scale variation across diverse data-generating scenarios. We further apply the framework to two national surveys to produce district-level fertility estimates from the 2022 Kenya Demographic and Health Survey and, more importantly, district-level indicators for unpaid care and domestic work and mass media usage from the 2021 Kenya Time Use Survey.

arXivπŸ“ˆπŸ€–
Areal Disaggregation: A Small Area Estimation Perspective
By Wu, Lindgren, Hanson

05.03.2026 17:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
In causal analysis, understanding the causal mechanisms through which an intervention or treatment affects an outcome is often of central interest. We propose a test to evaluate (i) whether the causal effect of a treatment that is randomly assigned conditional on covariates is fully mediated by, or operates exclusively through, observed intermediate outcomes (referred to as mediators or surrogate outcomes), and (ii) whether the various causal mechanisms operating through different mediators are identifiable conditional on covariates. We demonstrate that if both full mediation and identification of causal mechanisms hold, then the conditionally random treatment is conditionally independent of the outcome given the mediators and covariates. Furthermore, we extend our framework to settings with non-randomly assigned treatments. We show that, in this case, full mediation remains testable, while identification of causal mechanisms is no longer guaranteed. We propose a double machine learning framework for implementing the test that can incorporate high-dimensional covariates and is root-n consistent and asymptotically normal under specific regularity conditions. We also present a simulation study demonstrating good finite-sample performance of our method, along with two empirical applications revisiting randomized experiments on maternal mental health and social norms.

In causal analysis, understanding the causal mechanisms through which an intervention or treatment affects an outcome is often of central interest. We propose a test to evaluate (i) whether the causal effect of a treatment that is randomly assigned conditional on covariates is fully mediated by, or operates exclusively through, observed intermediate outcomes (referred to as mediators or surrogate outcomes), and (ii) whether the various causal mechanisms operating through different mediators are identifiable conditional on covariates. We demonstrate that if both full mediation and identification of causal mechanisms hold, then the conditionally random treatment is conditionally independent of the outcome given the mediators and covariates. Furthermore, we extend our framework to settings with non-randomly assigned treatments. We show that, in this case, full mediation remains testable, while identification of causal mechanisms is no longer guaranteed. We propose a double machine learning framework for implementing the test that can incorporate high-dimensional covariates and is root-n consistent and asymptotically normal under specific regularity conditions. We also present a simulation study demonstrating good finite-sample performance of our method, along with two empirical applications revisiting randomized experiments on maternal mental health and social norms.

arXivπŸ“ˆπŸ€–
Testing Full Mediation of Treatment Effects and the Identifiability of Causal Mechanisms
By Huber, Kloiber, Laff\'ers

05.03.2026 16:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
The difference-in-differences (DiD) design is a quasi-experimental method for estimating treatment effects. In staggered DiD with multiple treatment groups and periods, estimation based on the two-way fixed effects model yields negative weights when averaging heterogeneous group-period treatment effects into an overall effect. To address this issue, we first define group-period average treatment effects on the treated (ATT), and then define groupwise, periodwise, dynamic, and overall ATTs nonparametrically, so that the estimands are model-free. We propose doubly robust estimators for these types of ATTs in the form of augmented inverse variance weighting (AIVW). The proposed framework allows time-varying covariates that partially explain the time trends in outcomes. Even if part of the working models is misspecified, the proposed estimators still consistently estimate the parameter of interest. The asymptotic variance can be explicitly computed from influence functions. Under a homoskedastic working model, the AIVW estimator is simplified to an augmented inverse probability weighting (AIPW) estimator. We demonstrate the desirable properties of the proposed estimators through simulation and an application that compares the effects of a parallel admission mechanism with immediate admission on the China National College Entrance Examination.

The difference-in-differences (DiD) design is a quasi-experimental method for estimating treatment effects. In staggered DiD with multiple treatment groups and periods, estimation based on the two-way fixed effects model yields negative weights when averaging heterogeneous group-period treatment effects into an overall effect. To address this issue, we first define group-period average treatment effects on the treated (ATT), and then define groupwise, periodwise, dynamic, and overall ATTs nonparametrically, so that the estimands are model-free. We propose doubly robust estimators for these types of ATTs in the form of augmented inverse variance weighting (AIVW). The proposed framework allows time-varying covariates that partially explain the time trends in outcomes. Even if part of the working models is misspecified, the proposed estimators still consistently estimate the parameter of interest. The asymptotic variance can be explicitly computed from influence functions. Under a homoskedastic working model, the AIVW estimator is simplified to an augmented inverse probability weighting (AIPW) estimator. We demonstrate the desirable properties of the proposed estimators through simulation and an application that compares the effects of a parallel admission mechanism with immediate admission on the China National College Entrance Examination.

arXivπŸ“ˆπŸ€–
Doubly Robust Estimation of Treatment Effects in Staggered Difference-in-Differences with Time-Varying Covariates
By Deng, Kang

05.03.2026 16:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Dynamic structural equation models (DSEMs) combine time-series modeling of within-person processes with hierarchical modeling of between-person differences and differences between timepoints, and have become very popular for the analysis of intensive longitudinal data in the social sciences. An important computational bottleneck has, however, still not been resolved: whenever the underlying process is assumed to be latent and measured by one or more indicators per timepoint, currently published algorithms rely on inefficient brute-force Markov chain Monte Carlo sampling which scales poorly as the number of timepoints and participants increases and results in highly correlated samples. The main result of this paper shows that the within-level part of any DSEM can be reformulated as a linear Gaussian state space model. Consequently, the latent states can be analytically marginalized using a Kalman filter, allowing for highly efficient estimation via Hamiltonian Monte Carlo. This makes estimation of DSEMs computationally tractable for much larger datasets -- both in terms of timepoints and participants -- than what has been previously possible. We demonstrate the proposed algorithm in several simulation experiments, showing it can be orders of magnitude more efficient than standard Metropolis-within-Gibbs approaches.

Dynamic structural equation models (DSEMs) combine time-series modeling of within-person processes with hierarchical modeling of between-person differences and differences between timepoints, and have become very popular for the analysis of intensive longitudinal data in the social sciences. An important computational bottleneck has, however, still not been resolved: whenever the underlying process is assumed to be latent and measured by one or more indicators per timepoint, currently published algorithms rely on inefficient brute-force Markov chain Monte Carlo sampling which scales poorly as the number of timepoints and participants increases and results in highly correlated samples. The main result of this paper shows that the within-level part of any DSEM can be reformulated as a linear Gaussian state space model. Consequently, the latent states can be analytically marginalized using a Kalman filter, allowing for highly efficient estimation via Hamiltonian Monte Carlo. This makes estimation of DSEMs computationally tractable for much larger datasets -- both in terms of timepoints and participants -- than what has been previously possible. We demonstrate the proposed algorithm in several simulation experiments, showing it can be orders of magnitude more efficient than standard Metropolis-within-Gibbs approaches.

arXivπŸ“ˆπŸ€–
Efficient Bayesian Estimation of Dynamic Structural Equation Models via State Space Marginalization
By S{\o}rensen

05.03.2026 16:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Spatial autocorrelation in regression models can lead to downward biased standard errors and thus incorrect inference. The most common correction in applied economics is the spatial heteroskedasticity and autocorrelation consistent (HAC) standard error estimator introduced by Conley (1999). A critical input is the kernel bandwidth: the distance within which residuals are allowed to be correlated. However, this is still an unresolved problem and there is no formal guidance in the literature. In this paper, I first document that the relationship between the bandwidth and the magnitude of spatial HAC standard errors is inverse-U shaped. This implies that both too narrow and too wide bandwidths lead to underestimated standard errors, contradicting the conventional wisdom that wider bandwidths yield more conservative inference. I then propose a simple, non-parametric, data-driven bandwidth selector based on the empirical covariogram of regression residuals. In extensive Monte Carlo experiments calibrated to empirically relevant spatial correlation structures across the contiguous United States, I show that the proposed method controls the false positive rate at or near the nominal 5% level across a wide range of spatial correlation intensities and sample configurations. I compare six kernel functions and find that the Bartlett and Epanechnikov kernels deliver the best size control. An empirical application using U.S. county-level data illustrates the practical relevance of the method. The R package SpatialInference implements the proposed bandwidth selection method.

Spatial autocorrelation in regression models can lead to downward biased standard errors and thus incorrect inference. The most common correction in applied economics is the spatial heteroskedasticity and autocorrelation consistent (HAC) standard error estimator introduced by Conley (1999). A critical input is the kernel bandwidth: the distance within which residuals are allowed to be correlated. However, this is still an unresolved problem and there is no formal guidance in the literature. In this paper, I first document that the relationship between the bandwidth and the magnitude of spatial HAC standard errors is inverse-U shaped. This implies that both too narrow and too wide bandwidths lead to underestimated standard errors, contradicting the conventional wisdom that wider bandwidths yield more conservative inference. I then propose a simple, non-parametric, data-driven bandwidth selector based on the empirical covariogram of regression residuals. In extensive Monte Carlo experiments calibrated to empirically relevant spatial correlation structures across the contiguous United States, I show that the proposed method controls the false positive rate at or near the nominal 5% level across a wide range of spatial correlation intensities and sample configurations. I compare six kernel functions and find that the Bartlett and Epanechnikov kernels deliver the best size control. An empirical application using U.S. county-level data illustrates the practical relevance of the method. The R package SpatialInference implements the proposed bandwidth selection method.

arXivπŸ“ˆπŸ€–
Bandwidth Selection for Spatial HAC Standard Errors
By Lehner

05.03.2026 16:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Bounded continuous data on the unit interval frequently arise in applied fields and often exhibit a non-negligible proportion of observations at the boundaries. Inflated regression models address this feature by combining a continuous distribution on the unit interval with a discrete component to account for zero- and/or one-inflation. In this paper, we propose a class of Bayesian structured additive quantile regression models for inflated bounded continuous data that accommodates zero- and/or one-inflation. The proposed approach enables direct modeling of both the conditional quantiles of the continuous component and the probabilities of observing zeros and/or ones, with structured additive predictors incorporated in both parts, including nonlinear effects, spatial effects, random effects, and varying-coefficient terms. Posterior inference is carried out using Markov chain Monte Carlo algorithms implemented through the software Liesel, a probabilistic programming framework for semiparametric regression. The practical performance of the proposed models is illustrated through simulation studies and two real-data applications: one analyzing the proportion of traffic-related fatalities across Brazilian municipal districts, and another evaluating speech intelligibility in cochlear implant recipients under different experimental conditions.

Bounded continuous data on the unit interval frequently arise in applied fields and often exhibit a non-negligible proportion of observations at the boundaries. Inflated regression models address this feature by combining a continuous distribution on the unit interval with a discrete component to account for zero- and/or one-inflation. In this paper, we propose a class of Bayesian structured additive quantile regression models for inflated bounded continuous data that accommodates zero- and/or one-inflation. The proposed approach enables direct modeling of both the conditional quantiles of the continuous component and the probabilities of observing zeros and/or ones, with structured additive predictors incorporated in both parts, including nonlinear effects, spatial effects, random effects, and varying-coefficient terms. Posterior inference is carried out using Markov chain Monte Carlo algorithms implemented through the software Liesel, a probabilistic programming framework for semiparametric regression. The practical performance of the proposed models is illustrated through simulation studies and two real-data applications: one analyzing the proportion of traffic-related fatalities across Brazilian municipal districts, and another evaluating speech intelligibility in cochlear implant recipients under different experimental conditions.

arXivπŸ“ˆπŸ€–
Bayesian structured additive quantile regression for inflated bounded data
By Queiroz, Brachem, Wiemann et al

05.03.2026 16:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Multiple seasonalities have been widely studied in continuous time series using models such as TBATS, for instance in electricity demand forecasting. However, their treatment in categorical time series, such as air quality index (AQI) data, remains limited. Categorical AQI often exhibits distinct seasonal patterns at multiple frequencies, which are not captured by standard models. In this paper, we propose a framework that models multiple seasonalities using Fourier series and indicator functions, inspired by the TBATS methodology. The approach accommodates the ordinal nature of AQI categories while explicitly capturing daily, weekly and yearly seasonal cycles. Simulation studies demonstrate the empirical consistency of parameter estimates under the proposed model. We further illustrate its applicability using real categorical AQI data from Kolkata and compare forecasting performance with Markov models and machine learning methods. Results indicate that our approach effectively captures complex seasonal dynamics and provides improved predictive accuracy. The proposed methodology offers a flexible and interpretable framework for analyzing categorical time series exhibiting multiple seasonal patterns, with potential applications in air quality monitoring, energy consumption and other environmental domains.

Multiple seasonalities have been widely studied in continuous time series using models such as TBATS, for instance in electricity demand forecasting. However, their treatment in categorical time series, such as air quality index (AQI) data, remains limited. Categorical AQI often exhibits distinct seasonal patterns at multiple frequencies, which are not captured by standard models. In this paper, we propose a framework that models multiple seasonalities using Fourier series and indicator functions, inspired by the TBATS methodology. The approach accommodates the ordinal nature of AQI categories while explicitly capturing daily, weekly and yearly seasonal cycles. Simulation studies demonstrate the empirical consistency of parameter estimates under the proposed model. We further illustrate its applicability using real categorical AQI data from Kolkata and compare forecasting performance with Markov models and machine learning methods. Results indicate that our approach effectively captures complex seasonal dynamics and provides improved predictive accuracy. The proposed methodology offers a flexible and interpretable framework for analyzing categorical time series exhibiting multiple seasonal patterns, with potential applications in air quality monitoring, energy consumption and other environmental domains.

arXivπŸ“ˆπŸ€–
Forecasting of Multiple Seasonal Categorical Time Series Using Fourier Series with Application to AQI Data of Kolkata
By Ghosh, Maiti

05.03.2026 16:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Regression discontinuity designs (RDD) are widely used for causal inference. In many empirical applications, treatment effects vary substantially with covariates, and ignoring such heterogeneity can lead to misleading conclusions, which motivates flexible modeling of heterogeneous treatment effects in RDD. To this end, we propose a Bayesian nonparametric approach to estimating heterogeneous treatment effects based on Bayesian Additive Regression Trees (BART). The key feature of our method lies in adopting a general Bayesian framework using a pseudo-model defined through a loss function for fitting local linear models around the cutoff, which gives direct modeling of heterogeneous treatment effects by BART. Optimal selection of the bandwidth parameter for the local model is implemented using the Hyv\"arinen score. Through numerical experiments, we demonstrate that the proposed approach flexibly captures complicated structures of heterogeneous treatment effects as a function of covariates.

Regression discontinuity designs (RDD) are widely used for causal inference. In many empirical applications, treatment effects vary substantially with covariates, and ignoring such heterogeneity can lead to misleading conclusions, which motivates flexible modeling of heterogeneous treatment effects in RDD. To this end, we propose a Bayesian nonparametric approach to estimating heterogeneous treatment effects based on Bayesian Additive Regression Trees (BART). The key feature of our method lies in adopting a general Bayesian framework using a pseudo-model defined through a loss function for fitting local linear models around the cutoff, which gives direct modeling of heterogeneous treatment effects by BART. Optimal selection of the bandwidth parameter for the local model is implemented using the Hyv\"arinen score. Through numerical experiments, we demonstrate that the proposed approach flexibly captures complicated structures of heterogeneous treatment effects as a function of covariates.

arXivπŸ“ˆπŸ€–
Direct Bayesian Additive Regression Trees for Conditional Average Treatment Effects in Regression Discontinuity Designs
By Kondo, Sugasawa

05.03.2026 16:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
This paper is motivated by a cutting-edge application in neuroscience: the analysis of electroencephalogram (EEG) signals recorded under flash stimulation. Under commonly used signal-processing assumptions, only the phase angle of the EEG is required for the analysis of such applications. We demonstrate that these assumptions imply that the phase has a projected isotropic normal distribution. We revisit this distribution and derive several new properties, including closed-form expressions for its trigonometric moments. We then examine the distribution of the mean resultant and its square -- a statistic of central importance in phase-based EEG studies. The distribution of the resultant is analytically intricate; to make it practically useful, we develop two approximations based on the well-known resultant distribution for the von Mises distribution. We then study inference problems for this projected isotropic normal distribution. The method is illustrated with an application to EEG data from flash-stimulation experiments.

This paper is motivated by a cutting-edge application in neuroscience: the analysis of electroencephalogram (EEG) signals recorded under flash stimulation. Under commonly used signal-processing assumptions, only the phase angle of the EEG is required for the analysis of such applications. We demonstrate that these assumptions imply that the phase has a projected isotropic normal distribution. We revisit this distribution and derive several new properties, including closed-form expressions for its trigonometric moments. We then examine the distribution of the mean resultant and its square -- a statistic of central importance in phase-based EEG studies. The distribution of the resultant is analytically intricate; to make it practically useful, we develop two approximations based on the well-known resultant distribution for the von Mises distribution. We then study inference problems for this projected isotropic normal distribution. The method is illustrated with an application to EEG data from flash-stimulation experiments.

arXivπŸ“ˆπŸ€–
The projected isotropic normal distribution with applications in neuroscience
By Mardia, Sa'

05.03.2026 16:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Many learning tasks represent responses as multivariate probability measures, requiring repeated computation of weighted barycenters in Wasserstein space. In multivariate settings, transport barycenters are often computationally demanding and, more importantly, are generally not well posed under the affine weight schemes inherent to global and local Fre\'chet regression, where weights sum to one but may be negative. We propose HiMAP, a Hilbert mass-aligned parameterization that endows multivariate measures with a distribution-invariant notion of quantile level. The construction recursively refines the domain through equiprobable conditional-median splits and follows a Hilbert curve ordering, so that a single scalar index consistently tracks cumulative probability mass across distributions. This yields an embedding into a Hilbert function space and induces a tractable discrepancy for distribution comparison and averaging. Crucially, the representation is closed under affine averaging, leading to a closed-form, well-posed barycenter and an explicit distribution-valued Fre\'chet regression estimator obtained by averaging HiMAP quantile maps. We establish consistency and a dimension-dependent polynomial convergence rate for HiMAP estimators under mild conditions, matching the classical rates for empirical convergence in multivariate Wasserstein geometry. Numerical experiments and a multivariate climate-indicator study demonstrate that HiMAP delivers barycenters and regression fits comparable to standard optimal-transport surrogates while achieving substantial speedups in schemes dominated by repeated barycenter evaluations.

Many learning tasks represent responses as multivariate probability measures, requiring repeated computation of weighted barycenters in Wasserstein space. In multivariate settings, transport barycenters are often computationally demanding and, more importantly, are generally not well posed under the affine weight schemes inherent to global and local Fre\'chet regression, where weights sum to one but may be negative. We propose HiMAP, a Hilbert mass-aligned parameterization that endows multivariate measures with a distribution-invariant notion of quantile level. The construction recursively refines the domain through equiprobable conditional-median splits and follows a Hilbert curve ordering, so that a single scalar index consistently tracks cumulative probability mass across distributions. This yields an embedding into a Hilbert function space and induces a tractable discrepancy for distribution comparison and averaging. Crucially, the representation is closed under affine averaging, leading to a closed-form, well-posed barycenter and an explicit distribution-valued Fre\'chet regression estimator obtained by averaging HiMAP quantile maps. We establish consistency and a dimension-dependent polynomial convergence rate for HiMAP estimators under mild conditions, matching the classical rates for empirical convergence in multivariate Wasserstein geometry. Numerical experiments and a multivariate climate-indicator study demonstrate that HiMAP delivers barycenters and regression fits comparable to standard optimal-transport surrogates while achieving substantial speedups in schemes dominated by repeated barycenter evaluations.

arXivπŸ“ˆπŸ€–
HiMAP: Hilbert Mass-Aligned Parameterization for Multivariate Barycenters and Fre\'chet Regression
By Wang, Huang, Zhu et al

05.03.2026 16:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Method validation and study design in causal inference rely on synthetic data with known counterfactuals. Existing simulators trade off distributional realism, the ability to capture mixed-type and multimodal tabular data, against causal controllability, including explicit control over overlap, unmeasured confounding, and treatment effect heterogeneity. We introduce CausalMix, a variational generative framework that closes this gap by coupling a mixture of Gaussian latent priors with data-type-specific decoders for continuous, binary, and categorical variables. The model incorporates explicit causal controls: an overlap regularizer shaping propensity-score distributions, alongside direct parameterizations of confounding strength and effect heterogeneity. This unified objective preserves fidelity to the observed data while enabling factorial manipulation of causal mechanisms, allowing overlap, confounding strength, and treatment effect heterogeneity to be varied independently at design time. Across benchmarks, CausalMix achieves state-of-the-art distributional metrics on mixed-type tables while providing stable, fine-grained causal control. We demonstrate practical utility in a comparative safety study of metastatic castration-resistant prostate cancer treatments, using CausalMix to compare estimators under calibrated data-generating processes, tune hyperparameters, and conduct simulation-based power analyses under targeted treatment effect heterogeneity scenarios.

Method validation and study design in causal inference rely on synthetic data with known counterfactuals. Existing simulators trade off distributional realism, the ability to capture mixed-type and multimodal tabular data, against causal controllability, including explicit control over overlap, unmeasured confounding, and treatment effect heterogeneity. We introduce CausalMix, a variational generative framework that closes this gap by coupling a mixture of Gaussian latent priors with data-type-specific decoders for continuous, binary, and categorical variables. The model incorporates explicit causal controls: an overlap regularizer shaping propensity-score distributions, alongside direct parameterizations of confounding strength and effect heterogeneity. This unified objective preserves fidelity to the observed data while enabling factorial manipulation of causal mechanisms, allowing overlap, confounding strength, and treatment effect heterogeneity to be varied independently at design time. Across benchmarks, CausalMix achieves state-of-the-art distributional metrics on mixed-type tables while providing stable, fine-grained causal control. We demonstrate practical utility in a comparative safety study of metastatic castration-resistant prostate cancer treatments, using CausalMix to compare estimators under calibrated data-generating processes, tune hyperparameters, and conduct simulation-based power analyses under targeted treatment effect heterogeneity scenarios.

arXivπŸ“ˆπŸ€–
Controllable Generative Sandbox for Causal Inference
By Zhang, Parikh, Naimi et al

05.03.2026 16:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Fine stratification survey is useful in many applications as its point estimator is unbiased, but the variance estimator under the design cannot be easily obtained, particularly when the sample size per stratum is as small as one unit. One common practice to overcome this difficulty is to collapse strata in pairs to create pseudo-strata and then estimate the variance. The estimator of variance achieved is not design-unbiased, and the positive bias increases as the population means of the paired pseudo-strata become more variant. The resulting confidence intervals can be unnecessarily large. In this paper, we propose a new Bayesian estimator for variance which does not rely on collapsing strata, unlike the previous methods given in the literature. We employ the penalized spline method for smoothing the mean and variance together in a nonparametric way. Furthermore, we make comparisons with the earlier work of Breidt et al. (2016). Throughout multiple simulation studies and an illustration using data from the National Survey of Family Growth (NSFG), we demonstrate the favorable performance of our methodology.

Fine stratification survey is useful in many applications as its point estimator is unbiased, but the variance estimator under the design cannot be easily obtained, particularly when the sample size per stratum is as small as one unit. One common practice to overcome this difficulty is to collapse strata in pairs to create pseudo-strata and then estimate the variance. The estimator of variance achieved is not design-unbiased, and the positive bias increases as the population means of the paired pseudo-strata become more variant. The resulting confidence intervals can be unnecessarily large. In this paper, we propose a new Bayesian estimator for variance which does not rely on collapsing strata, unlike the previous methods given in the literature. We employ the penalized spline method for smoothing the mean and variance together in a nonparametric way. Furthermore, we make comparisons with the earlier work of Breidt et al. (2016). Throughout multiple simulation studies and an illustration using data from the National Survey of Family Growth (NSFG), we demonstrate the favorable performance of our methodology.

arXivπŸ“ˆπŸ€–
Bayesian Estimation of Variance under Fine Stratification via Mean-Variance Smoothing
By Mosaferi, Sugasawa

05.03.2026 16:26 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Explanations of the replication crisis often emphasize misconduct, questionable research practices, or incentive misalignment, implying that behavioral reform is sufficient. This paper argues that a substantial component is architectural: within binary significance-based publication systems, even perfectly diligent researchers face structural limits on the reliability they can deliver.
  The posterior log-odds of a finding equal prior log-odds plus log(Lambda), where Lambda = (1-beta)/alpha is the experimental leverage. Interpreted architecturally, this implies a hard constraint: once evidence is coarsened to a binary significance decision, the decision rule contributes exactly log(Lambda) to posterior log-odds. A target reliability tau is feasible iff pi >= pi_crit, and under fixed alpha this generally cannot be rescued by sample size alone. Two mechanisms can drive effective leverage to 1 without bad faith: persistent unmeasured confounding in observational studies and unbounded specification search under publication pressure. These results concern binary significance-based decision architectures and do not bound inference based on full likelihoods or richer continuous evidence summaries. Two collapse results formalize these mechanisms, while the Replication Pipeline Theorem and Minimum Pipeline Depth Corollary identify a quantitative evidentiary standard for escape.
  Using independently documented parameters for pre-reform psychology (pi about 0.10, power about 0.35), the framework implies a replication rate of 36%, consistent with the Open Science Collaboration. The framework also provides quantitative bridges to Popper, Kuhn, and Lakatos. In low-prior settings below the single-study feasibility threshold, the natural unit of evidence is the replication pipeline rather than the individual experiment.

Explanations of the replication crisis often emphasize misconduct, questionable research practices, or incentive misalignment, implying that behavioral reform is sufficient. This paper argues that a substantial component is architectural: within binary significance-based publication systems, even perfectly diligent researchers face structural limits on the reliability they can deliver. The posterior log-odds of a finding equal prior log-odds plus log(Lambda), where Lambda = (1-beta)/alpha is the experimental leverage. Interpreted architecturally, this implies a hard constraint: once evidence is coarsened to a binary significance decision, the decision rule contributes exactly log(Lambda) to posterior log-odds. A target reliability tau is feasible iff pi >= pi_crit, and under fixed alpha this generally cannot be rescued by sample size alone. Two mechanisms can drive effective leverage to 1 without bad faith: persistent unmeasured confounding in observational studies and unbounded specification search under publication pressure. These results concern binary significance-based decision architectures and do not bound inference based on full likelihoods or richer continuous evidence summaries. Two collapse results formalize these mechanisms, while the Replication Pipeline Theorem and Minimum Pipeline Depth Corollary identify a quantitative evidentiary standard for escape. Using independently documented parameters for pre-reform psychology (pi about 0.10, power about 0.35), the framework implies a replication rate of 36%, consistent with the Open Science Collaboration. The framework also provides quantitative bridges to Popper, Kuhn, and Lakatos. In low-prior settings below the single-study feasibility threshold, the natural unit of evidence is the replication pipeline rather than the individual experiment.

arXivπŸ“ˆπŸ€–
The Certainty Bound: Structural Limits on Scientific Reliability
By Pollanen

05.03.2026 16:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
The Galleri (R) (GRAIL) multi-cancer early detection test measures circulating tumour DNA (ctDNA) to predict the presence of more than 50 different cancers, from a blood test. If sensitivity of the test to detect early-stage cancers is high, using it as part of a screening programme may lead to better cancer outcomes, but available evidence indicates there is heterogeneity in sensitivity between cancer types and stages. We describe a framework for sharing evidence on test sensitivity between cancer types and/or stages, examining whether models with different sharing assumptions are supported by the evidence and considering how further data could be used to strengthen inference. Bayesian hierarchical models were fitted, and the impact of information sharing in increasing precision of the estimates of test sensitivity for different cancer types and stages was examined. Assumptions on sharing were informed by evidence from a review of the literature on the determinants of ctDNA shedding and its detection in a blood test. Support was strongest for the assumption that sensitivity can be shared only across stage 4 for all cancer types. There was also support for the assumption that sensitivities can be shared across cancer types for each stage, if cancer types expected to have low sensitivity are excluded which increased precision of early-stage cancer sensitivity estimates and was considered the most appropriate model. High heterogeneity limited improvements in precision. For future research, elicitation of expert opinion could inform more realistic sharing assumptions.

The Galleri (R) (GRAIL) multi-cancer early detection test measures circulating tumour DNA (ctDNA) to predict the presence of more than 50 different cancers, from a blood test. If sensitivity of the test to detect early-stage cancers is high, using it as part of a screening programme may lead to better cancer outcomes, but available evidence indicates there is heterogeneity in sensitivity between cancer types and stages. We describe a framework for sharing evidence on test sensitivity between cancer types and/or stages, examining whether models with different sharing assumptions are supported by the evidence and considering how further data could be used to strengthen inference. Bayesian hierarchical models were fitted, and the impact of information sharing in increasing precision of the estimates of test sensitivity for different cancer types and stages was examined. Assumptions on sharing were informed by evidence from a review of the literature on the determinants of ctDNA shedding and its detection in a blood test. Support was strongest for the assumption that sensitivity can be shared only across stage 4 for all cancer types. There was also support for the assumption that sensitivities can be shared across cancer types for each stage, if cancer types expected to have low sensitivity are excluded which increased precision of early-stage cancer sensitivity estimates and was considered the most appropriate model. High heterogeneity limited improvements in precision. For future research, elicitation of expert opinion could inform more realistic sharing assumptions.

arXivπŸ“ˆπŸ€–
A Bayesian approach to sharing information on sensitivity of a Multi-Cancer Early Detection test across and within tumour types and stages
By Dias, Liu, Palmer et al

05.03.2026 03:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
The Ethereum blockchain network enables transaction processing and smart-contract execution through levies of transaction fees, commonly known as gas fees. This framework mediates economic participation via a market-based mechanism for gas fees, permitting users to offer higher gas fees to expedite pro-cessing. Historically, the ensuing gas fee volatility led to critical disequilibria between supply and demand for block space, presenting stakeholder challenges. This study examines the dynamic causal interplay between transaction fees and economic subsystems leveraging the network. By utilizing data related to unique active wallets and transaction volume of each subsystem and applying time-varying Granger causality analysis, we reveal temporal heterogeneity in causal relationships between economic activity and transaction fees across all subsystems. This includes (a) a bidirectional causal feedback loop between cross-blockchain bridge user activity and transaction fees, which diminishes over time, potentially signaling user migration; (b) a bidirectional relationship between centralized cryptocurrency exchange deposit and withdrawal transaction volume and fees, indicative of increased competition for block space; (c) decentralized exchange volumes causally influence fees, while fees causally influence user activity, although this relationship is weakening, potentially due to the diminished significance of decentralized finance; (d) intermittent causal relationships with maximal extractable value bots; (e) fees causally in-fluence non-fungible token transaction volumes; and (f) a highly significant and growing causal influence of transaction fees on stablecoin activity and transaction volumes highlight its prominence.

The Ethereum blockchain network enables transaction processing and smart-contract execution through levies of transaction fees, commonly known as gas fees. This framework mediates economic participation via a market-based mechanism for gas fees, permitting users to offer higher gas fees to expedite pro-cessing. Historically, the ensuing gas fee volatility led to critical disequilibria between supply and demand for block space, presenting stakeholder challenges. This study examines the dynamic causal interplay between transaction fees and economic subsystems leveraging the network. By utilizing data related to unique active wallets and transaction volume of each subsystem and applying time-varying Granger causality analysis, we reveal temporal heterogeneity in causal relationships between economic activity and transaction fees across all subsystems. This includes (a) a bidirectional causal feedback loop between cross-blockchain bridge user activity and transaction fees, which diminishes over time, potentially signaling user migration; (b) a bidirectional relationship between centralized cryptocurrency exchange deposit and withdrawal transaction volume and fees, indicative of increased competition for block space; (c) decentralized exchange volumes causally influence fees, while fees causally influence user activity, although this relationship is weakening, potentially due to the diminished significance of decentralized finance; (d) intermittent causal relationships with maximal extractable value bots; (e) fees causally in-fluence non-fungible token transaction volumes; and (f) a highly significant and growing causal influence of transaction fees on stablecoin activity and transaction volumes highlight its prominence.

arXivπŸ“ˆπŸ€–
Time-Varying Bidirectional Causal Relationships Between Transaction Fees and Economic Activity of Subsystems Utilizing the Ethereum Blockchain Network
By Ante, Saggu

05.03.2026 01:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
arXiv:2405.03603v1 Announce Type: new 
Abstract: Publication bias (PB) is one of the serious issues in meta-analysis. Many existing methods dealing with PB are based on the normal-normal (NN) random-effects model assuming normal models in both the within-study and the between-study levels. For rare-event meta-analysis where the data contain rare occurrences of event, the standard NN random-effects model may perform poorly. Instead, the generalized linear mixed effects model (GLMM) using the exact within-study model is recommended. However, no method has been proposed for dealing with PB in rare-event meta-analysis using the GLMM. In this paper, we propose sensitivity analysis methods for evaluating the impact of PB on the GLMM based on the famous Copas-Heckman-type selection model. The proposed methods can be easily implemented with the standard software coring the nonlinear mixed-effects model. We use a real-world example to show how the usefulness of the proposed methods in evaluating the potential impact of PB in meta-analysis of the log-transformed odds ratio based on the GLMM using the non-central hypergeometric or binomial distribution as the within-study model. An extension of the proposed method is also introduced for evaluating PB in meta-analysis of proportion based on the GLMM with the binomial within-study model.

arXiv:2405.03603v1 Announce Type: new Abstract: Publication bias (PB) is one of the serious issues in meta-analysis. Many existing methods dealing with PB are based on the normal-normal (NN) random-effects model assuming normal models in both the within-study and the between-study levels. For rare-event meta-analysis where the data contain rare occurrences of event, the standard NN random-effects model may perform poorly. Instead, the generalized linear mixed effects model (GLMM) using the exact within-study model is recommended. However, no method has been proposed for dealing with PB in rare-event meta-analysis using the GLMM. In this paper, we propose sensitivity analysis methods for evaluating the impact of PB on the GLMM based on the famous Copas-Heckman-type selection model. The proposed methods can be easily implemented with the standard software coring the nonlinear mixed-effects model. We use a real-world example to show how the usefulness of the proposed methods in evaluating the potential impact of PB in meta-analysis of the log-transformed odds ratio based on the GLMM using the non-central hypergeometric or binomial distribution as the within-study model. An extension of the proposed method is also introduced for evaluating PB in meta-analysis of proportion based on the GLMM with the binomial within-study model.

arXivπŸ“ˆπŸ€–
Copas-Heckman-type sensitivity analysis for publication bias in rare-event meta-analysis under the framework of the generalized linear mixed model
By

04.03.2026 22:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
In this article, we present a novel inference framework for estimating the parameters of Continuous-State Branching Processes (CSBPs). We do so by leveraging their subordinator representation. Our method reformulates the estimation problem by shifting the stochastic dynamics to the associated subordinator, enabling a parametric estimation procedure without requiring additional assumptions. This reformulation allows for efficient numerical recovery of the likelihood function via Laplace transform inversion, even in models where closed-form transition densities are unavailable. In addition to offering a flexible approach to parameter estimation, we propose a dynamic simulation framework that generates discrete-time trajectories of CSBPs using the same subordinator-based structure.

In this article, we present a novel inference framework for estimating the parameters of Continuous-State Branching Processes (CSBPs). We do so by leveraging their subordinator representation. Our method reformulates the estimation problem by shifting the stochastic dynamics to the associated subordinator, enabling a parametric estimation procedure without requiring additional assumptions. This reformulation allows for efficient numerical recovery of the likelihood function via Laplace transform inversion, even in models where closed-form transition densities are unavailable. In addition to offering a flexible approach to parameter estimation, we propose a dynamic simulation framework that generates discrete-time trajectories of CSBPs using the same subordinator-based structure.

arXivπŸ“ˆπŸ€–
Parameter Estimation for Partially Observed Stable Continuous-State Branching Processes
By Guti\'errez-Pe\~na, P\'erez-Mendoza, Palacio et al

04.03.2026 19:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0