统计方法学
Absolute anonymization, conceived as an irreversible transformation that prevents re-identification and sensitive value disclosure, has proven to be a broken promise. Consequently, modern data protection must shift toward a privacy-utility…
This paper investigates change-point of variance in panel data models with time series of $\alpha$-mixing. Based on the cumulative sum (CUSUM) method and the individual differences, we construct a CUSUM test for panel data models to detect…
We investigate asymptotic inference in a linear regression model where both response and regressors are functions, using an estimator based on functional principal components analysis. Although this approach is widely used in functional…
A common impediment in conducting inference for Bayesian nonparametric models is either the need for complex MCMC algorithms and/or computational run-time for large datasets. We propose solutions here for Enriched Dirichlet process mixtures…
Understanding covariate-varying interdependencies among features is of great interest in various applications. Motivated by microbiome studies where microbial abundances and interactions vary with environmental factors, we develop a…
Offline reinforcement learning (RL) aims to learn decision policies from a fixed batch of logged transitions, without additional environment interaction. Despite remarkable empirical progress, offline RL remains fragile under distribution…
Existing sequential generalized estimating equation methodology for longitudinal and group-correlated data focuses on narrow hypotheses concerning treatment efficacy and often makes modeling assumptions that impede the desirable robustness…
Conformal novelty detection is a classical machine learning task for which uncertainty quantification is essential for providing reliable results. Recent work has shown that the BH procedure applied to conformal p-values controls the false…
We develop a statistical framework for empirical Bayes learning from selectively reported confidence intervals, and apply it to provide context for interpreting results published in MEDLINE abstracts. We use a collection of 326,060 z-scores…
Some applied researchers hesitate to use nonparametric methods, worrying that they will lose power in small samples or overfit the data when simpler models are sufficient. We argue that at least some of these concerns are unfounded when…
The ordered allocation sampler is a Gibbs sampler designed to explore the posterior distribution in nonparametric mixture models. It encompasses both infinite mixtures and finite mixtures with random number of components, and it has be…
Our model for the lifespan of an enterprise is the geometric distribution. We do not formulate a model for enterprise foundation, but assume that foundations and lifespans are independent. We aim to fit the model to information about…
Networks arise naturally in many scientific fields as a representation of pairwise connections. Statistical network analysis has most often considered a single large network, but it is common in a number of applications to observe multiple…
Inference of the reproduction number through time is of vital importance during an epidemic outbreak. Typically, epidemiologists tackle this using observed prevalence or incidence data. However, prevalence and incidence data alone is often…
Estimating the probability of failure for expensive simulations is a central task in reliability analysis for structural design, power grid design, and safety certification, among other areas. This work derives credible intervals on the…
Questionnaires in the behavioral sciences tend to be lengthy. However, literature suggests that survey length is a contributing factor to careless responding, with longer questionnaires yielding higher probability that participants start…
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing, with a focus on functionals that arise in causal inference. We study the case where probability distributions are…
In this work, a novel approach to Bayesian model calibration routines is developed which reinterprets the traditional definition of model discrepancy as defined by Kennedy and O'Hagan (KOH). The novelty lies in the integration of…
We propose Distributionally Balanced Designs (DBD), a new class of probability sampling designs that target representativeness at the level of the full auxiliary distribution rather than selected moments. In disciplines such as ecology,…
Influence maximization in networks is a central problem in machine learning and causal inference, where an intervention on a subset of individuals triggers a diffusion process through the network. Existing approaches typically optimize…