统计方法学
We present an updated version of lookout -- an algorithm for detecting anomalies using kernel density estimates with bandwidth based on Rips death diameters -- with theoretical guarantees. The kernel density estimator for updated lookout is…
Quantile regression is useful for characterizing the conditional distribution of a response variable and understanding heterogeneity in the covariate effects at different quantiles. The rise of high-dimensional physiological data in…
Spline quantile regression (SQR) is a method introduced recently by Li and Megiddo (2026) for linear quantile regression where the regression coefficients are treated as smooth functions of the quantile level. With the coefficients…
The usual parametric models for survival data are of the following form. Some parametrically specified hazard rate $\alpha(s,\theta)$ is assumed for possibly censored random life times $X_1^0,\ldots,X_n^0$; one observes only…
The purpose of this paper is to develop and illustrate certain classes of graphical plots that can be used for model verification in quite general survival data and life history data models. By suitably comparing nonparametric and…
Unidimensional factor models justify some of the most consequential summaries in science -- single scores, single ranks, and single leaderboards -- yet unidimensionality is usually assessed indirectly by fitting and evaluating models on…
Detecting brief changes in time-series data remains a major challenge in fields where short-lived states carry meaning. In single-molecule localisation microscopy, this problem is particularly acute as fluorescent molecules used to tag…
To minimize the mean squared error (MSE) in global average treatment effect (GATE) estimation under network interference, a popular approach is to use a cluster-randomized design. However, in the presence of homophily, which is common in…
Interpreting RNA-sequencing data requires identifying coordinated gene expression patterns that correspond to biological pathways. Standard factor models provide useful dimension reduction but typically ignore existing pathway knowledge or…
Precision matrix estimation is a fundamental topic in multivariate statistics and modern machine learning. This paper proposes an adversarially perturbed precision matrix estimation framework, motivated by recent developments in adversarial…
Network data has attracted growing interest across scientific domains, prompting the development of various network models. Existing network analysis methods mainly focus on unsigned networks, whereas signed networks, consisting of both…
Causal mediation analysis in cluster-randomized trials (CRTs) is essential for explaining how cluster-level interventions affect individual outcomes, yet it is complicated by interference, post-treatment confounding, and hierarchical…
Modern regression analyses are often undermined by covariate measurement error, misspecification of the regression model, and misspecification of the measurement error distribution. We present, to the best of our knowledge, the first…
Forecasting El Nino is one of the greatest challenges of science. We show how intensive, large and accurate time series allow us to see through time. Our Discrete Chi-square Method (DCM) can detect arbitrary trend and signal(-s)…
Practical employment of Bayesian trial designs is still rare. Even if accepted in principle, the regulators have commonly required that such designs be calibrated according to an upper bound for the frequentist type I error rate. This…
Classification and probability estimation are fundamental tasks with broad applications across modern machine learning and data science, spanning fields such as biology, medicine, engineering, and computer science. Recent development of…
Joint modeling of multiview graphs with a common set of nodes between views and auxiliary predictors is an essential, yet less explored, area in statistical methodology. Traditional approaches often treat graphs in different views as…
While change point detection in time series data has been extensively studied, little attention has been given to its generalisation to data observed on spheres or other manifolds, where changes may occur within spatially complex regions…
Causal effect estimation often succeeds cost-constrained sequential data collection. This work considers multivariate linear front-door models with arbitrary unobserved confounding on treatment and response. We optimize the experimental…
Stochastic epidemic models can estimate infection and removal rates, and derived quantities such as the basic reproductive number ($R_0$), when both infection and removal times are observed. In practice, however, removal times are often…