Related papers: Targeted Undersmoothing

Uniformly valid confidence intervals post-model-selection

We suggest general methods to construct asymptotically uniformly valid confidence intervals post-model-selection. The constructions are based on principles recently proposed by Berk et al. (2013). In particular the candidate models used can…

Statistics Theory · Mathematics 2017-11-15 François Bachoc , David Preinerstorfer , Lukas Steinberger

Post-Selection Confidence Bounds for Prediction Performance

In machine learning, the selection of a promising model from a potentially large number of competing models and the assessment of its generalization performance are critical tasks that need careful consideration. Typically, model selection…

Machine Learning · Statistics 2023-02-06 Pascal Rink , Werner Brannath

Model-specific Data Subsampling with Influence Functions

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the…

Machine Learning · Computer Science 2020-10-21 Anant Raj , Cameron Musco , Lester Mackey , Nicolo Fusi

Large multi-response linear regression estimation based on low-rank pre-smoothing

Pre-smoothing is a technique aimed at increasing the signal-to-noise ratio in data to improve subsequent estimation and model selection in regression problems. However, pre-smoothing has thus far been limited to the univariate response…

Methodology · Statistics 2026-04-23 Xinle Tian , Alex Gibberd , Matthew Nunes , Sandipan Roy

Lasso under Multi-way Clustering: Estimation and Post-selection Inference

This paper studies high-dimensional regression models with lasso when data is sampled under multi-way clustering. First, we establish convergence rates for the lasso and post-lasso estimators. Second, we propose a novel inference method…

Econometrics · Economics 2019-08-22 Harold D. Chiang , Yuya Sasaki

Inference on Treatment Effects After Selection Amongst High-Dimensional Controls

We propose robust methods for inference on the effect of a treatment variable on a scalar outcome in the presence of very many controls. Our setting is a partially linear model with possibly non-Gaussian and heteroscedastic disturbances.…

Methodology · Statistics 2017-10-05 Alexandre Belloni , Victor Chernozhukov , Christian Hansen

Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach

The focus of modern biomedical studies has gradually shifted to explanation and estimation of joint effects of high dimensional predictors on disease risks. Quantifying uncertainty in these estimates may provide valuable insight into…

Methodology · Statistics 2021-03-09 Zhe Fei , Yi Li

Method of Contraction-Expansion (MOCE) for Simultaneous Inference in Linear Models

Simultaneous inference after model selection is of critical importance to address scientific hypotheses involving a set of parameters. In this paper, we consider high-dimensional linear regression model in which a regularization procedure…

Machine Learning · Statistics 2019-08-06 Fei Wang , Ling Zhou , Lu Tang , Peter X. -K. Song

Beyond Classification: Evaluating Diffusion Denoised Smoothing for Security-Utility Trade off

While foundation models demonstrate impressive performance across various tasks, they remain vulnerable to adversarial inputs. Current research explores various approaches to enhance model robustness, with Diffusion Denoised Smoothing…

Machine Learning · Computer Science 2025-05-22 Yury Belousov , Brian Pulfer , Vitaliy Kinakh , Slava Voloshynovskiy

Posterior Uncertainty for Targeted Parameters in Bayesian Bootstrap Procedures

We propose a general method to carry out a valid Bayesian analysis of a finite-dimensional `targeted' parameter in the presence of a finite-dimensional nuisance parameter. We apply our methods to causal inference based on estimating…

Methodology · Statistics 2026-02-03 Magid Sabbagh , David A. Stephens

The EAS approach to variable selection for multivariate response data in high-dimensional settings

In this paper, we develop an {\em epsilon admissible subsets} (EAS) model selection approach for performing group variable selection in the high-dimensional multivariate regression setting. This EAS strategy is designed to estimate a…

Methodology · Statistics 2024-01-17 Salil Koner , Jonathan P Williams

When predict can also explain: few-shot prediction to select better neural latents

Latent variable models serve as powerful tools to infer underlying dynamics from observed neural activity. Ideally, the inferred dynamics should align with true ones. However, due to the absence of ground truth data, prediction benchmarks…

Machine Learning · Computer Science 2025-08-26 Kabir Dabholkar , Omri Barak

Targeted learning via probabilistic subpopulation matching

In biomedical research, to obtain more accurate prediction results from a target study, leveraging information from multiple similar source studies is proved to be useful. However, in many biomedical applications based on real-world data,…

Methodology · Statistics 2025-12-29 Xiaokang Liu , Jie Hu , Naimin Jing , Yang Ning , Cheng Yong Tang , Runze Li , Yong Chen

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models

Speech foundation models, such as HuBERT and its variants, are pre-trained on large amounts of unlabeled speech data and then used for a range of downstream tasks. These models use a masked prediction objective, where the model learns to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-22 Li-Wei Chen , Takuya Higuchi , He Bai , Ahmed Hussen Abdelaziz , Alexander Rudnicky , Shinji Watanabe , Tatiana Likhomanenko , Barry-John Theobald , Zakaria Aldeneh

Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models

This work proposes new inference methods for a regression coefficient of interest in a (heterogeneous) quantile regression model. We consider a high-dimensional model where the number of regressors potentially exceeds the sample size but a…

Statistics Theory · Mathematics 2017-10-05 Alexandre Belloni , Victor Chernozhukov , Kengo Kato

Unexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifier

The data functions that are studied in the course of functional data analysis are assembled from discrete data, and the level of smoothing that is used is generally that which is appropriate for accurate approximation of the conceptually…

Statistics Theory · Mathematics 2013-12-19 Raymond J. Carroll , Aurore Delaigle , Peter Hall

Adaptive Discrete Smoothing for High-Dimensional and Nonlinear Panel Data

In this paper we develop a data-driven smoothing technique for high-dimensional and non-linear panel data models. We allow for individual specific (non-linear) functions and estimation with econometric or machine learning methods by using…

Methodology · Statistics 2020-01-06 Xi Chen , Ye Luo , Martin Spindler

Meta-Learned Confidence for Few-shot Learning

Transductive inference is an effective means of tackling the data deficiency problem in few-shot learning settings. A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class…

Machine Learning · Computer Science 2020-06-25 Seong Min Kye , Hae Beom Lee , Hoirin Kim , Sung Ju Hwang

Robust Universal Inference For Misspecified Models

In statistical inference, it is rarely realistic that the hypothesized statistical model is well-specified, and consequently it is important to understand the effects of misspecification on inferential procedures. When the hypothesized…

Methodology · Statistics 2025-09-01 Beomjo Park , Sivaraman Balakrishnan , Larry Wasserman

Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

Before entering the neural network, a token is generally converted to the corresponding one-hot representation, which is a discrete distribution of the vocabulary. Smoothed representation is the probability of candidate tokens obtained from…

Computation and Language · Computer Science 2022-03-01 Xing Wu , Chaochen Gao , Meng Lin , Liangjun Zang , Zhongyuan Wang , Songlin Hu