Related papers: Analysis of Testing-Based Forward Model Selection
Forward regression is a crucial methodology for automatically identifying important predictors from a large pool of potential covariates. In contexts with moderate predictor correlation, forward selection techniques can achieve screening…
Forward regression is a statistical model selection and estimation procedure which inductively selects covariates that add predictive power into a working statistical regression model. Once a model is selected, unknown regression parameters…
The multivariate normal linear model is one of the most widely employed models for statistical inference in applied research. Special cases include (multivariate) t testing, (M)AN(C)OVA, (multivariate) multiple regression, and repeated…
This paper introduces Targeted Function Balancing (TFB), a covariate balancing weights framework for estimating the average treatment effect of a binary intervention. TFB first regresses an outcome on covariates, and then selects weights…
Time series foundation models (TSFMs) are a class of potentially powerful, general-purpose tools for time series forecasting and related temporal tasks, but their behavior is strongly shaped by subtle inductive biases in their design.…
Most scientific publications follow the familiar recipe of (i) obtain data, (ii) fit a model, and (iii) comment on the scientific relevance of the effects of particular covariates in that model. This approach, however, ignores the fact that…
As the frontiers of applied statistics progress through increasingly complex experiments we must exploit increasingly sophisticated inferential models to analyze the observations we make. In order to avoid misleading or outright erroneous…
Selective inference aims at providing valid inference after a data-driven selection of models or hypotheses. It is essential to avoid overconfident results and replicability issues. While significant advances have been made in this area for…
We introduce a classification method based on in-context learning using time-series foundation models (TSFMs). We demonstrate how data not included in the TSFM training can be classified without fine-tuning the foundation model or training…
Regression testing is an essential activity to assure that software code changes do not adversely affect existing functionalities. With the wide adoption of Continuous Integration (CI) in software projects, which increases the frequency of…
Feedforward neural networks (FNNs) can be viewed as non-linear regression models, where covariates enter the model through a combination of weighted summations and non-linear functions. Although these models have some similarities to the…
We propose a new approach to safe variable preselection in high-dimensional penalized regression, such as the lasso. Preselection - to start with a manageable set of covariates - has often been implemented without clear appreciation of its…
In this paper, we investigate the hypothesis testing problem that checks whether part of covariates / confounders significantly affect the heterogeneous treatment effect given all covariates. This model checking is particularly useful in…
Hypothesis testing in the linear regression model is a fundamental statistical problem. We consider linear regression in the high-dimensional regime where the number of parameters exceeds the number of samples ($p> n$). In order to make…
Process Model Forecasting (PMF) aims to predict how the control-flow structure of a process evolves over time by modeling the temporal dynamics of directly-follows (DF) relations, complementing predictive process monitoring that focuses on…
In large-scale Bayesian inverse problems, it is often necessary to apply approximate forward models to reduce the cost of forward model evaluations, while controlling approximation quality. In the context of Bayesian inverse problems with…
Forward regression is a classical and effective tool for variable screening in ultra-high dimensional linear models, but its standard projection-based implementation can be computationally costly and numerically unstable when predictors are…
Varying coefficient models have numerous applications in a wide scope of scientific areas. While enjoying nice interpretability, they also allow flexibility in modeling dynamic impacts of the covariates. But, in the new era of big data, it…
The linear regression model is widely used in empirical work in Economics, Statistics, and many other disciplines. Researchers often include many covariates in their linear model specification in an attempt to control for confounders. We…
In machine learning, the selection of a promising model from a potentially large number of competing models and the assessment of its generalization performance are critical tasks that need careful consideration. Typically, model selection…