统计学
Prediction from sequential panel data is central to digital-twin modeling, where new panels arrive over time and the predictive system is updated sequentially. Existing methods often rely on temporal proximity, which can fail when similar…
Accurate modeling of wind turbine power curves is crucial for optimal wind farm operation. Nearly all existing power curve models focus on temporal variables such as wind speed and temperature while overlooking the influence of terrain…
Insurance payments may depend on latent micro states although only macro states and realized payments are observed. We study a sojourn-payment model for such aggregated multi-state systems under left-truncation and right-censoring. Starting…
We develop a Bayesian area-level small area estimation framework that jointly models binomial and Gaussian survey responses through shared spatial random effects. This work is motivated by the American Community Survey (ACS), which provides…
Argo profiling floats measure seawater temperature and salinity in the upper 2000 meters of the ocean. These floats are uniquely capable of measuring the global Ocean Heat Content (OHC), a quantity that is of central importance for…
Identifying patients who are likely to benefit from a treatment is central to precision medicine and can guide follow-up trials, enrichment designs, and individualized decisions. Although randomized controlled trials (RCTs) provide evidence…
While conformal prediction provides a general framework for uncertainty quantification in predictive inference, its application is often limited by computational cost. Recent methods, including Jackknife+ and Jackknife-minmax, achieve…
Protein-protein interaction (PPI) networks, estimated from high-throughput omics data, foster biomarker discovery and precision medicine. Gaussian graphical models (GGMs) offer a principled reconstruction framework. Yet, existing…
Nitrogen fertilizer management plays a central role in balancing agricultural productivity and environmental sustainability, yet identifying optimal application strategies remains difficult because treatment responses vary substantially…
The Gaussian Kernel Robust Regression method (GKRReg) is a robust regression estimator that iteratively re-weights observations via a Gaussian kernel so that outliers and leverage points receive near-zero weight, with convergence of the…
Functional principal component analysis (FPCA) is a central tool for dimension reduction and covariance analysis in functional data analysis. We study FPCA for discretely observed scalar-valued functional data indexed by a compact…
Demographic corrections are routinely performed in many disciplines, including psychology. Yet, there are ongoing debates about whether these corrections are appropriate and improve classification accuracy. Here, we focus on cognitive…
The increasing availability of diverse data sources has motivated great interest in data integration for improving regression efficiency. Existing data integration methods primarily focus on integrating nonprobability samples and typically…
Functional data analysis (FDA) provides statistical methods for analyzing samples of time-continuous stochastic processes. Measurements often arise in the form of sensor data for a key scientific variable. The practical problem of irregular…
The classical $k$-means clustering, based on distances computed from all data features, cannot be directly applied to incomplete data with missing values. A natural extension of $k$-means to missing data is to involve only the observed…
Modern experiments, including evaluations of AI-enabled services and platform interventions, often depart from independent and identically distributed (i.i.d.) sampling because assignments may be adaptive, balanced across covariates, or…
Broken adaptive ridge (BAR) penalty approximates $L_0$-regularization through iterative reweighting of L2 penalties. This penalty enjoys both the oracle property and the grouping effect for highly correlated covariates, making it…
Gaussian process inference is often limited by cubic computational costs, a challenge that becomes more pronounced in spatio-temporal settings where posterior inference is required over dense grids. While state-space SPDE formulations…
Ranked choice voting (RCV) is a popular alternative voting method in which voters are asked to list their favored candidates in preference order, rather than vote for a single candidate. When these ballots are tabulated, candidates are…
In biological data from allometry studies, the largest eigenvalue is typically dominant, and the gaps between minor eigenvalues are often narrow. Such proximity among small minor eigenvalues can lead to instability in statistics based on…