Statistics
We demonstrate that learning procedures that rely on aggregated labels, e.g., label information distilled from noisy responses, enjoy robustness properties impossible without data cleaning. This robustness appears in several ways. In the…
Ecological Momentary Assessment (EMA) studies enable the collection of high-frequency self-reports of suicidal thoughts and behaviors (STBs) via smartphones. Latent stochastic differential equations (SDEs) are a promising model class for…
Motivated by the need for efficient estimation of conditional expectations, we consider a least-squares function approximation problem with heavily polluted data. Existing methods that are effective in the small-noise regime are suboptimal…
Long Short-Term Memory (LSTM) neural network models have become the cornerstone for sequential data modeling in numerous applications, ranging from natural language processing to time series forecasting. Despite their success, the problem…
The factor modeling for high-dimensional time series is powerful in discovering latent common components for dimension reduction and information extraction. Most available estimation methods can be divided into two categories: the…
In many image analysis problems, the contours of objects carry important statistical information about shape. Such contours are typically affected by deformation variables including scaling, translation, rotation, and reparametrization.…
Traditional geostatistical methods assume independence between observation locations and the spatial process of interest. Violations of this independence assumption are referred to as preferential sampling (PS). Standard methods to address…
Time series in natural sciences, such as hydrology and climatology, and other environmental applications, often consist of continuous observations constrained to the unit interval (0,1). Traditional Gaussian-based models fail to capture…
Characterising cause-effect relationships in complex systems is fundamental to understanding their underlying mechanisms. Granger causality (GC) remains a widely used computational tool for identifying causal relationships in time series…
In clinical studies, the risk of the primary (terminal) event may be modified by intermediate events, resulting in semicompeting risks. To study the treatment effect on the terminal event mediated by the intermediate event, researchers wish…
We introduce a novel class of Bayesian mixtures for normal linear regression models which incorporates a further Gaussian random component for the distribution of the predictor variables. The proposed cluster-weighted model aims to…
Adaptive experiments use preliminary analyses of the data to inform further course of action and are commonly used in many disciplines including medical and social sciences. Because the null hypothesis and experimental design are…
Data assimilation combines dynamical models with observations to improve state estimates. Ensemble filters sequentially assimilate observations by updating a set of samples over time, alternating between a forecast and an analysis step.…
In randomized controlled trials (RCTs) that focus on time-to-event outcomes, intercurrent events can arise in two ways: as semi-competing events, which modify the hazard of the primary outcome events, or as competing events, which make the…
NASA's Interstellar Boundary Explorer (IBEX) satellite collects data on energetic neutral atoms (ENAs) that can provide insight into the heliosphere boundary between our solar system and interstellar space. Using these data, scientists can…
Transportation agencies have an opportunity to leverage increasingly-available trajectory datasets to improve their analyses and decision-making processes. However, this data is typically purchased from vendors, which means agencies must…
Gradient-flow sampling interprets a Gibbs distribution as the minimizer of an energy functional over probability measures and generates dynamics converging to this target. Under spherical Hellinger-Kantorovich (SHK) geometry, the flow…
We develop a gradient flow on the space of probability measures defined on matrix-valued parameters induced by regularized Muon, an analytically smoothed version of the idealized Muon optimizer. The key observation is that the regularized…
The accelerating shift toward low and ultra-low fertility has intensified the debate over whether countries now undergoing rapid decline are approaching stabilization or entering a more persistent low-fertility regime. Existing projection…
The reuse of medico-administrative and synthetic spatial data may overcome some limitations of population-based registries, provided rigorous validation is performed. However, no tool exists to spatially validate a candidate-for-reuse…