统计方法学
Missing data is a ubiquitous challenge in data analysis, often leading to biased and inaccurate results. Traditional imputation methods usually assume that the missingness mechanism is missing-at-random (MAR), where the missingness is…
Transformations of covariates are widely used in applied statistics to improve interpretability and to satisfy assumptions required for valid inference. More broadly, feature engineering encompasses a wider set of practices aimed at…
Finite-sample bias is a pervasive challenge in the estimation of structural equation models (SEMs), especially when sample sizes are small or measurement reliability is low. A range of methods have been proposed to improve finite-sample…
Randomized experiments are increasingly employed in two-sided markets, such as buyer--seller platforms, to evaluate the effects of marketplace interventions. These experiments must reflect the underlying two-sided market structure in their…
In the era of big data, the increasing availability of diverse data sources has driven interest in analytical approaches that integrate information across sources to enhance statistical accuracy, efficiency, and scientific insights. Many…
Real world spatio-temporal datasets, and phenomena related to them, are often challenging to visualise or gain a general overview of. In order to summarise information encompassed in such data, we combine two well known statistical…
This paper discusses estimation and limited information goodness-of-fit test statistics in factor models for binary data using pairwise likelihood estimation and sampling weights. The paper extends the applicability of pairwise likelihood…
We introduce a generalized Bayesian method for multiple changepoint analysis with a loss function inspired by multinomial logistic regression. The method does not require a specification of the data-generating process and avoids restrictive…
In this applied paper, we address the difficult open problem of when to discharge patients from the Intensive Care Unit. This can be conceived as an optimal stopping scenario with three added challenges: 1) the evaluation of a stopping…
Causal representation learning seeks to uncover causal relationships among high-level latent variables from low-level, entangled, and noisy observations. Existing approaches often either rely on deep neural networks, which lack…
We propose a Bayesian propensity score-augmented latent factor model for causal inference with time-series cross-sectional data. The framework explicitly models the treatment assignment mechanism by incorporating latent factor loadings,…
We propose a unified framework to draw inferences for regression coefficients in a generalized linear model (GLM) following Lasso-based variable selection. We adapt to non-Gaussian GLMs a recently developed parametric programming strategy…
In applications, quantities of interest are often modelled in equilibrium or an equilibrium solution is sought. The presence of confounding makes causal inference in this setting challenging. We provide interpretable graphical models for…
We introduce a flexible framework for high-dimensional matrix estimation to incorporate side information for both rows and columns. Existing approaches, such as inductive matrix completion, often impose restrictive structure-for example, an…
This paper introduces robust twoblock (RTB) simultaneous dimension reduction, which is the first statistically robust method to perform simultaneous dimension reduction in two blocks of variables and allows to fine-tune the model complexity…
In many scientific applications, hypotheses are generated and tested continuously in a stream. We develop a framework for improving online multiple testing procedures with false discovery rate (FDR) control under arbitrary dependence. Our…
Causal discovery aims to infer causal relationships among variables from observational data, typically represented by a directed acyclic graph (DAG). Most existing methods assume independent and identically distributed observations, an…
We consider the statistical problem of estimating constituent curves from observations of their aggregated curves, referred to as \textit{aggregated functional data}, in models with strictly positive random errors following a Gamma…
In deploying artificial intelligence (AI) models, selective prediction offers the option to abstain from making a prediction when uncertain about model quality. To fulfill its promise, it is crucial to enforce strict and precise error…
Suppose data are fitted to some parametric model but that the true model happens to be one with an additional parameter. When a parameter is to be estimated one can use likelihood estimation in the wider model or in the narrow model.…