统计方法学
The Mann-Whitney effect is an effect measure for the order of two sample-specific outcome variables. It has the interpretation of a probability and also a connection to the area under the ROC curve. In the literature it has been considered…
Sequential analysis encompasses simulation theories and methods where the sample size is determined dynamically based on accumulating data. Since the conceptual inception, numerous sequential stopping rules have been introduced, and many…
Structural equation modeling (SEM) is a prevalent approach for studying constructs.Traditionally, these constructs are modeled as reflectively measured latent variables - common factors that account for the variance-covariance structure of…
Flow cytometry is a valuable technique that measures the optical properties of particles at a single-cell resolution. When deployed in the ocean, flow cytometry allows oceanographers to study different types of photosynthetic microbes…
Conformal prediction is a popular method to construct prediction intervals with marginal coverage guarantees from black-box machine learning models. In applications with potentially high-impact events, such as flooding or financial crises,…
Postpartum hemorrhage (PPH) remains a leading cause of maternal morbidity and mortality worldwide. Oxytocin, though widely recognized for facilitating labor, is also the primary pharmacological intervention for PPH prevention. However,…
Gaussian variational approximations are widely used for summarizing posterior distributions in Bayesian models, especially in high-dimensional settings. However, a drawback of such approximations is the inability to capture skewness or more…
Calibration ensures that predicted uncertainties align with observed uncertainties. While there is an extensive literature on recalibration methods for univariate probabilistic forecasts, work on calibration for multivariate forecasts is…
Dynamic linear regression models forecast the values of a time series based on a linear combination of a set of exogenous time series while incorporating a time series process for the error term. This error process is often assumed to…
Symbolic data analysis (SDA) aggregates large individual-level datasets into a small number of distributional summaries, such as random rectangles or random histograms. The inference is carried out using these summaries in place of the…
Weighted conformal prediction (WCP) has been commonly used to quantify prediction uncertainty under covariate shift. However, the effectiveness of WCP relies heavily on the degree of overlap between the training and test covariate…
Scientists often want to explain why an outcome is different in two groups. For instance, differences in patient mortality rates across two hospitals could be due to differences in the patients themselves (covariates) or differences in…
We consider testing zero pricing errors in high-dimensional linear factor pricing models. Existing methods are mainly based on either an $L_2$ statistic, which is effective under dense alternatives, or an $L_\infty$ statistic, which is…
Deep learning is widely deployed for time series learning tasks such as classification and forecasting. Despite the empirical successes, only little theory has been developed so far in the time series context. In this work, we prove that if…
Background: In clinical research, the Bland-Altman analysis is commonly used to assess agreement of metric measurements made by two or more techniques, devices or methods. The approach can also deal with repeated measurements per subject or…
This paper is the second part of our study on the non-parametric estimation of MS-NAR processes started with [L. Fermin et al. 2017]. We consider the Nadaraya-Watson type regression function estimator for non-linear autoregressive Markov…
Bayesian inference for models with intractable likelihoods, such as Markov random fields, poses a fundamental computational challenge due to the tradeoff between inferential accuracy and computational cost. Various MCMC methods have been…
Statistical inference in parametric models (e.g., the Bradley--Terry model and its variants) for paired-comparison data has been explored in the high-dimensional regime, in which the number of items involving in paired comparisons diverges.…
Despite its extensive development for multivariate data, semi-supervised learning remains underdeveloped for functional data. To address this challenge, we extend the Fermat distance, a density-sensitive metric aligning with the…
We introduce BLOC (Black-box Optimization over Correlation matrices), a general framework for sparse covariance estimation with non-convex penalties. BLOC operates on the manifold of correlation matrices and reparameterizes it via an…