统计方法学
Semicontinuous outcomes occur frequently in health services, insurance, and cost studies. Standard nonparametric density estimators are not well suited to such data because they do not naturally accommodate the mixed structure, the…
Estimating the mean counterfactual outcome under a treatment rule is a central problem in causal inference and policy evaluation. Standard estimators, including inverse probability weighting (IPW), augmented IPW (AIPW), and targeted maximum…
This study evaluates the performance of 36 historical CMIP6 GCM trajectories (1979-2005) in reproducing atmospheric circulation over the Iberian Peninsula in the summer months (June-September) using the Lamb Weather Type (WT) classification…
We propose a novel Phase I intra-patient dose-escalation design tailored for multi-cycle immunotherapy settings, in which toxicity at a fixed dose level is clinically expected to decrease over successive treatment cycles. This design was…
RNA-seq count data are often affected by read-to-gene alignment ambiguity, especially in high-dimensional transcriptomics. This type of ambiguity can be conveniently expressed through granular counts, namely fuzzy-valued observations of…
Background: A core aspect of epidemiology is determining the impacts of potential public health interventions over time. With long follow-up periods, epidemiologists may need to consider semi-competing events, in which a terminal event,…
There is growing interest in a hybrid control design for treatment evaluation, where a randomized controlled trial is augmented with external control data from a previous trial or a real world data source. The hybrid control design has the…
Environmental processes often exhibit complex, non-linear patterns and discontinuities across space and time, posing significant challenges for traditional geostatistical modeling approaches. In this paper, we propose a hybrid…
We consider the problem of sampling from a probability distribution $\pi$ which admits a density w.r.t. a dominating measure. It is well known that this can be written as an optimisation problem over the space of probability distributions…
Identifying relationships among stochastic processes is a core objective in many fields, such as economics. While the standard toolkit for multivariate time series analysis has many advantages, it can be difficult to capture nonlinear…
We propose a new class of metrics, called the survival independence divergence (SID), to test dependence between a right-censored outcome and covariates. A key technique for deriving the SIDs is to use a counting process strategy, which…
When entering French university, the students' foreign language level is assessed through a placement test. In this work, we model the placement test results using binary latent block models which allow to simultaneously form homogeneous…
Penalized empirical risk minimization with a surrogate loss function is often used to learn a high-dimensional linear decision rule in classification problems. Although much of the literature focus on the generalization error, there is a…
This paper focuses on block likelihood estimation for geostatistical data, a method that balances statistical accuracy and computational efficiency. Central to this approach is the choice of block size, which can significantly impact…
Regional survey estimates and their significance levels are simultaneously displayed in maps that show all 3,141 U.S. counties and equivalents. An analyst can focus his attention on significant differences (or those with a different,…
When analyzing data researchers make some decisions that are either arbitrary, based on subjective beliefs about the data generating process, or for which equally justifiable alternative choices could have been made. This wide range of…
Rare disease trials face unique statistical challenges due to limited patient populations and heterogeneous clinical manifestations among patients. Multiple endpoints are often necessary to comprehensively capture treatment benefits. A…
This chapter introduces the Bayesian reflex -- an analogy with the autonomic nervous system -- as a unifying framework for online learning in AI. Bayesian online algorithms automatically maintain equilibrium in dynamic environments via…
We describe the R package EstemPMM, which implements the Polynomial Maximization Method (PMM) for parameter estimation under non-Gaussian errors. PMM exploits higher-order cumulants of the error distribution -- specifically the third…
The Hawkes process is used to model point process data where events occur in clusters and bursts. In a standard multivariate Hawkes process, every event that occurs in a dimension has an equal impact on the process intensity. However, this…