统计方法学
In Bayesian inference for the Cox proportional hazards model, modeling the baseline hazard function is challenging. Recently, direct Bayesian inference using the partial likelihood is considered in the framework of general Bayesian…
Unnormalized probability distributions are frequently used in machine learning for modeling complex data generating processes. Though Markov chain Monte Carlo (MCMC) algorithms can approximately sample from unnormalized distributions,…
We propose using a discounted version of a convex combination of the log-likelihood with the corresponding expected log-likelihood such that when they are maximized they yield a filter, predictor and smoother for time series. This paper…
The least absolute shrinkage and selection operator (Lasso) is a popular method for high-dimensional statistics. However, it is known that the Lasso often has estimation bias and prediction error. To address such disadvantages, many…
A discrete Bayesian network is a directed acyclic graph (DAG) consisting of categorical variables. Two popular approaches for DBN modeling include classification and nonparametric methods. However, both methods often require a large number…
Unmeasured confounding can severely bias causal effect estimates from spatiotemporal observational data, especially when the confounders do not vary smoothly in time and space. In this work, we develop a method for addressing unmeasured…
Group testing techniques are widely used in resource-constrained settings, such as infectious-disease screening, blood safety, DNA library screening, and industrial inspection, where the efficient use of limited testing resources depends…
We study the data-driven selection of causal graphical models using constraint-based algorithms, which determine the existence or non-existence of edges (causal connections) in a graph based on testing a series of conditional independence…
We consider the problem of joint simultaneous confidence band (JSCB) construction for regression coefficient functions of time series scalar-on-function linear regression when the regression model is estimated by roughness penalization…
Functional factor analysis is an important dimension reduction method for functional and longitudinal data. Factor loadings give insight into patterns of variability of the observations, while latent factors provide a low-dimensional…
Multi-armed bandits are widely used for sequential experimentation in clinical trials, recommendation systems, and online platforms. While regret minimization and valid inference from adaptively collected data have each been studied…
Digital travel platforms often operate multiple marketing journeys simultaneously, resulting in overlapping user exposures that bias the standard A/B lift estimation. Because traditional lift experiments assume treatment isolation, the…
A novel data-driven methodology is presented for the joint selection of prior parameters for both fixed and random effects in Linear Mixed Models (LMMs). This approach facilitates the estimation of complex random-effects structures, as well…
Immune checkpoint inhibitor--based therapies often produce heterogeneous survival responses, including early risk, delayed treatment benefit, and durable long-term survival in a subset of patients. In these settings, conventional summary…
Effectively controlling the false discovery rate (FDR) in high-dimensional variable selection is a fundamental statistical problem that has garnered significant research interest. In this paper, we propose a novel, user-friendly, and…
We propose a framework, the Neyman Jackknife, for conservative variance estimation in finite-population causal inference under interference. Our approach provides a general, flexible blueprint that enables conservative variance estimation…
This paper considers multiple extended object tracking based on Poisson multi-Bernoulli mixture (PMBM) filtering, which gives the closed-form Bayesian solution for standard multiple extended object models with Poisson birth. To efficiently…
When, in terms of the number of data points, the size of a dataset exceeds available computing resources, or when labeling is expensive, an attractive solution consists of selecting only some of the data points (subdata) for further…
Cell--cell communication (CCC) is commonly inferred from ligand--receptor co-expression, an associational paradigm that cannot distinguish causal signaling from shared regulation or confounding. We propose MR-CCC, a Bayesian Mendelian…
Background: Missing data poses an acute threat to sequential multiple assignment randomized trial (SMART) analyses because of the sequential treatment structure and response-dependent re-randomization. Objectives: This study aimed to (1)…