Statistics
This paper presents a Bayesian framework for inferring the posterior of the augmented state of a target, incorporating its underlying goal or intent, such as any intermediate waypoints and/or the final destination. Thus, it is for joint…
We introduce a Hamiltonian Monte Carlo (HMC) methodology based on a randomized selection of integration times, referred to as eHMC, where "e" stands for empirical. The approach relies on an offline calibration phase that leverages…
Nowcasting and forecasting of infectious diseases have become increasingly important since the SARS-CoV-2 pandemic. In particular, methods for modeling the composition of circulating variants at a given time have seen more use in part due…
A popular class of priors for symmetric positive-definite matrices assumes independent entries and adds a truncation to ensure positive-definiteness. While conceptually simple and often computationally convenient, unless done carefully this…
We introduce a new class of conditional autoregressive models for spatially dependent functional data, formulated through conditional means given neighboring functional observations and characterized by a covariance operator and a spatial…
The Hilbert-Schmidt Independence Criterion (HSIC) and its joint-independence extension $d\mathrm{HSIC}$ are degenerate $V$-statistics whose data-dependent weighted-$\chi^2$ null limits force a permutation calibration that multiplies the…
Shilling is the use of artificial bids to make competition appear stronger and push prices upward. We study repeated first-price auctions in which shilling affects feedback but not allocation: the learner wins or loses against the real…
This paper reconstructs the half-century evolution of the scientific school founded by Yuriy P. Kunchenko (1939--2006) as the development of a semiparametric methodology for non-Gaussian estimation. Starting with Kunchenko's 1972/1973…
Two of the most widely used methods for analysing graph data, Adjacency Spectral Embedding and Laplacian Spectral Embedding, often produce different results when applied to the same network. Yet the structural reasons behind this…
Specifying a full Bayesian model that integrates multiple data sources can be challenging. One natural approach is to specify each individual model separately and join them afterwards. This is the approach adopted in Markov melding.…
The moveEZ (pronounced move easy) R package provides tools for constructing animated PCA biplots that reveal how multivariate structure evolves across the ordered levels of a categorical variable. Built as an extension to the biplotEZ…
This extended preface [to the Book `Bayesian Nonparametrics', Cambridge University Press, 2010, by NL Hjort, CC Holmes, P Mueller, SG Walker] is meant to explain why you are right to be curious about Bayesian nonparametrics -- why you may…
The growing use of high-throughput sequencing (HTS) has enabled the large-scale production of compositional count data, driving progress in microbiome research. However, such count data are often high-dimensional, over-dispersed, and…
This is a verbatim copy of a technical report I wrote in 2017-2018 to obtain the law of the iterated logarithm using the guarantee on the wealth of an online betting strategy.
We propose a computationally simple framework for clustering functional data based on Gaussian-process-generated random projections. In this approach, each curve is first projected onto a large collection of independent Gaussian process…
Epilepsy is a neurological disorder characterized by recurrent seizures affecting more than 70 million people worldwide. Often, an individual with epilepsy is more likely to experience subsequent seizures following an initial seizure, a…
This note provides a lightweight tutorial on using Eigen, a C++ template library for linear algebra, to implement statistical and machine learning algorithms. The emphasis is practical rather than methodological: we show how common matrix…
We develop a Hilbert--Schmidt independence criterion (HSIC)-based framework for testing serial independence in strictly stationary time series. The proposed auto Hilbert--Schmidt independence criterion (AutoHSIC) measures dependence between…
We consider one-hidden layer neural networks trained in the feature-learning regime using gradient descent, and relate the output of the finite-width network $f_{\hat{\rho}_t^m}$ to its infinite-width counterpart $f_{\rho_t^{MF}}$, which…
Conformal methods provide prediction sets for outcomes with confidence guarantees. We study their use in a selective inference setting, where inference is performed only when the prediction set is informative. The analyst may consider as…