统计方法学
The analysis of competing risks data is often complicated by misclassification of the cause of failure. This issue can lead to seriously biased estimates and invalid conclusions. One way to deal with such misclassification is to use a…
Suppose that a sequence of data points follows a distribution of a certain parametric form, but that one or more of the underlying parameters may change over time. This paper addresses various natural questions in such a framework. We…
Isotonic regression provides a flexible, tuning-free approach to estimating monotonic functions without imposing global curvature constraints, yet the estimated regression function is inherently a step function. This paper addresses a key…
Kunchenko's method of polynomial maximization provides a semiparametric apparatus for parameter estimation under non-Gaussian errors, but its classical power basis relies on finite higher-order integer moments. This paper introduces the…
Normative modeling enables individualized characterization of structural brain deviations by evaluating subjects against a reference population rather than a group average. Most existing implementations treat brain regions independently and…
This paper develops power and sample size formulas for causal inference with time-to-event outcomes. The target estimand is the marginal hazard ratio: the coefficient of a marginal structural Cox proportional hazard model with treatment as…
A high-quality experimental dataset is often much smaller than a corresponding observational dataset. When this holds with possibly biased measurements of the outcome of interest in the latter, we propose an estimation and inference…
The Learn-As-you-Go (LAGO) design is an adaptive clinical trial design that allows modifications to multicomponent intervention packages across stages. Centers participate in more than one stage, as is common in large-scale implementation…
Identifying the structural drivers of poverty in regional datasets is frequently hindered by small sample sizes and high multidimensional collinearity, which can result in unstable and misleading policy advice. This paper evaluates the…
This pedagogical review examines the use of machine learning methods in finite-population inference for survey sampling, with an emphasis on design-based validity and statistical inference. While flexible prediction tools offer substantial…
Spectral gaps, Kramers escape rates, and position-dependent relaxation timescales are dynamical invariants encoded in the infinitesimal generator $\Lop$ of a stochastic flow. We show that weak projection of the governing It\^{o} SDE onto…
We identify a fundamental pathology in the likelihood for time delay inference which challenges standard inference methods. By analysing the likelihood for time delay inference with Gaussian process light curve models, we show that it…
We study maximum likelihood estimation for spatial generalized linear mixed models with Gaussian process approximations using a stochastic Newton-Raphson algorithm. We consider two Gaussian Process approximations in this context: spectral…
We introduce a nonparametric model for inferring time-evolving, unobserved probability distributions from discrete-time data consisting of unlabelled partitions. The latent process is a two-parameter Poisson-Dirichlet diffusion, and…
Modeling multiple sampling densities within a hierarchical framework enables borrowing of information across samples. These density random effects can act as kernels in latent variable models to represent exchangeable subgroups or clusters.…
Clustering methods are often used in physics education research (PER) to identify subgroups of individuals within a population who share similar response patterns or characteristics. K-means (or k-modes, for categorical data) is one of the…
Machine learning (ML) models often exhibit bias that can exacerbate inequities in biomedical applications. Fairness auditing, the process of evaluating a model's performance across subpopulations, is critical for identifying and mitigating…
In this work, we propose new matrix- and tensor-based methodologies for estimating multivariate intensity functions of inhomogeneous point processes. By viewing multivariate intensity functions as infinite-dimensional matrices or tensors…
This paper investigates the theoretical foundation and develops analytical formulas for sample size and power calculations for causal inference with observational data. By analyzing the variance of an inverse probability weighting estimator…
The closure principle is a standard tool for achieving strong family-wise error rate (FWER) control in multiple testing problems. We develop an e-value-based closed testing framework that inherits nice properties of e-values, which are…