Related papers: Targeted Undersmoothing
We suggest general methods to construct asymptotically uniformly valid confidence intervals post-model-selection. The constructions are based on principles recently proposed by Berk et al. (2013). In particular the candidate models used can…
In machine learning, the selection of a promising model from a potentially large number of competing models and the assessment of its generalization performance are critical tasks that need careful consideration. Typically, model selection…
Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the…
Pre-smoothing is a technique aimed at increasing the signal-to-noise ratio in data to improve subsequent estimation and model selection in regression problems. However, pre-smoothing has thus far been limited to the univariate response…
This paper studies high-dimensional regression models with lasso when data is sampled under multi-way clustering. First, we establish convergence rates for the lasso and post-lasso estimators. Second, we propose a novel inference method…
We propose robust methods for inference on the effect of a treatment variable on a scalar outcome in the presence of very many controls. Our setting is a partially linear model with possibly non-Gaussian and heteroscedastic disturbances.…
The focus of modern biomedical studies has gradually shifted to explanation and estimation of joint effects of high dimensional predictors on disease risks. Quantifying uncertainty in these estimates may provide valuable insight into…
Simultaneous inference after model selection is of critical importance to address scientific hypotheses involving a set of parameters. In this paper, we consider high-dimensional linear regression model in which a regularization procedure…
While foundation models demonstrate impressive performance across various tasks, they remain vulnerable to adversarial inputs. Current research explores various approaches to enhance model robustness, with Diffusion Denoised Smoothing…
We propose a general method to carry out a valid Bayesian analysis of a finite-dimensional `targeted' parameter in the presence of a finite-dimensional nuisance parameter. We apply our methods to causal inference based on estimating…
In this paper, we develop an {\em epsilon admissible subsets} (EAS) model selection approach for performing group variable selection in the high-dimensional multivariate regression setting. This EAS strategy is designed to estimate a…
Latent variable models serve as powerful tools to infer underlying dynamics from observed neural activity. Ideally, the inferred dynamics should align with true ones. However, due to the absence of ground truth data, prediction benchmarks…
In biomedical research, to obtain more accurate prediction results from a target study, leveraging information from multiple similar source studies is proved to be useful. However, in many biomedical applications based on real-world data,…
Speech foundation models, such as HuBERT and its variants, are pre-trained on large amounts of unlabeled speech data and then used for a range of downstream tasks. These models use a masked prediction objective, where the model learns to…
This work proposes new inference methods for a regression coefficient of interest in a (heterogeneous) quantile regression model. We consider a high-dimensional model where the number of regressors potentially exceeds the sample size but a…
The data functions that are studied in the course of functional data analysis are assembled from discrete data, and the level of smoothing that is used is generally that which is appropriate for accurate approximation of the conceptually…
In this paper we develop a data-driven smoothing technique for high-dimensional and non-linear panel data models. We allow for individual specific (non-linear) functions and estimation with econometric or machine learning methods by using…
Transductive inference is an effective means of tackling the data deficiency problem in few-shot learning settings. A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class…
In statistical inference, it is rarely realistic that the hypothesized statistical model is well-specified, and consequently it is important to understand the effects of misspecification on inferential procedures. When the hypothesized…
Before entering the neural network, a token is generally converted to the corresponding one-hot representation, which is a discrete distribution of the vocabulary. Smoothed representation is the probability of candidate tokens obtained from…