Related papers: Model Averaging and Double Machine Learning
We introduce the package ddml for Double/Debiased Machine Learning (DDML) in Stata. Estimators of causal parameters for five different econometric models are supported, allowing for flexible estimation of causal effects of endogenous…
The double machine learning (DML) method combines the predictive power of machine learning with statistical estimation to conduct inference about the structural parameter of interest. This paper presents the R package `xtdml`, which…
This paper investigates double/debiased machine learning (DML) under multiway clustered sampling environments. We propose a novel multiway cross fitting algorithm and a multiway DML estimator based on this algorithm. We also develop a…
Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal…
Debiased machine learning estimators for smooth functionals in nonparametric models can exhibit substantial variability and instability, often leading practitioners to instead rely on parametric or semiparametric working models. Such…
The widely recommended procedure of Bayesian model averaging is flawed in the M-open setting in which the true data-generating process is not one of the candidate models being fit. We take the idea of stacking from the point estimation…
Stacking, a potent ensemble learning method, leverages a meta-model to harness the strengths of multiple base models, thereby enhancing prediction accuracy. Traditional stacking techniques typically utilize established learning models, such…
Estimating causal effect using machine learning (ML) algorithms can help to relax functional form assumptions if used within appropriate frameworks. However, most of these frameworks assume settings with cross-sectional data, whereas…
This paper studies double/debiased machine learning (DML) methods applied to weakly dependent data. We allow observations to be situated in a general metric space that accommodates spatial and network data. Existing work implements…
This paper provides an introduction to Double/Debiased Machine Learning (DML). DML is a general approach to performing inference about a target parameter in the presence of nuisance functions: objects that are needed to identify the target…
In this paper, we propose a model averaging approach for addressing model uncertainty in the context of partial linear functional additive models. These models are designed to describe the relation between a response and mixed-types of…
In the last decade, machine learning techniques have gained popularity for estimating causal effects. One machine learning approach that can be used for estimating an average treatment effect is Double/debiased machine learning (DML)…
Recent advances in causal inference have seen the development of methods which make use of the predictive power of machine learning algorithms. In this paper, we develop novel double machine learning (DML) procedures for panel data in which…
Support vector machine (SVM) is a well-known statistical technique for classification problems in machine learning and other fields. An important question for SVM is the selection of covariates (or features) for the model. Many studies have…
Stacking is a widely used model averaging technique that asymptotically yields optimal predictions among linear averages. We show that stacking is most effective when model predictive performance is heterogeneous in inputs, and we can…
Large language models are becoming the go-to solution for the ever-growing number of tasks. However, with growing capacity, models are prone to rely on spurious correlations stemming from biases and stereotypes present in the training data.…
Double/debiased machine learning (DML) provides a general framework for inference with high-dimensional or otherwise complex nuisance parameters by combining Neyman-orthogonal scores with cross-fitting, thereby circumventing classical…
Ensembling is a powerful technique for improving the accuracy of machine learning models, with methods like stacking achieving strong results in tabular tasks. In time series forecasting, however, ensemble methods remain underutilized, with…
In M-open problems where no true model can be conceptualized, it is common to back off from modeling and merely seek good prediction. Even in M-complete problems, taking a predictive approach can be very useful. Stacking is a model…
The gradient boosting machine is one of the powerful tools for solving regression problems. In order to cope with its shortcomings, an approach for constructing ensembles of gradient boosting models is proposed. The main idea behind the…