Related papers: PopulAtion Parameter Averaging (PAPA)
The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at an increased cost at inference. Weight averaging methods aim at balancing the generalization of…
In machine learning, ensembles are important tools for improving the model performance. In natural language processing specifically, ensembles boost the performance of a method due to multiple large models available in open source. However,…
Ensembling is now recognized as an effective approach for increasing the predictive performance and calibration of deep networks. We introduce a new approach, Parameter Ensembling by Perturbation (PEP), that constructs an ensemble of…
We propose a Policy Averaging Approach (PAA) that synthesizes the strengths of existing approaches to create more reliable, flexible and justifiable policies for stochastic optimization problems. An important component of the PAA is risk…
Generalizing causal estimates in randomized experiments to a broader target population is essential for guiding decisions by policymakers and practitioners in the social and biomedical sciences. While recent papers developed various…
Ensembling is a simple and popular technique for boosting evaluation performance by training multiple models (e.g., with different initializations) and aggregating their predictions. This approach is commonly reserved for the largest…
Ensemble models often improve generalization performances in challenging tasks. Yet, traditional techniques based on prediction averaging incur three well-known disadvantages: the computational overhead of training multiple models,…
Ensemble methods are widely employed to improve generalization in machine learning. This has also prompted the adoption of ensemble learning for the knowledge graph embedding (KGE) models in performing link prediction. Typical approaches to…
Aggregating multiple learners through an ensemble of models aim to make better predictions by capturing the underlying distribution of the data more accurately. Different ensembling methods, such as bagging, boosting, and stacking/blending,…
Ensembles of artificial neural networks show improved generalization capabilities that outperform those of single networks. However, for aggregation to be effective, the individual networks must be as accurate and diverse as possible. An…
Population annealing is a powerful sequential Monte Carlo algorithm designed to study the equilibrium behavior of general systems in statistical physics through massive parallelism. In addition to the remarkable scaling capabilities of the…
It is common practice to combine deep neural networks into ensembles. These deep ensembles can benefit from the cancellation of errors effect: Errors by ensemble members may average out, leading to better generalization performance than…
Population annealing (PA) is a population-based algorithm that is designed for equilibrium simulations of thermodynamic systems with a rough free energy landscape. It is known to be more efficient in doing so than standard Markov chain…
Weight averaging is a widely used technique for accelerating training and improving the generalization of deep neural networks (DNNs). While existing approaches like stochastic weight averaging (SWA) rely on pre-set weighting schemes, they…
Unlabeled data are increasingly prevalent in contemporary economic studies, yet their effective use for improving prediction remains challenging because the outcomes are often costly or even infeasible to observe. Machine learning methods…
An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more…
We propose Stochastic Weight Averaging in Parallel (SWAP), an algorithm to accelerate DNN training. Our algorithm uses large mini-batches to compute an approximate solution quickly and then refines it by averaging the weights of multiple…
Despite empirical risk minimization (ERM) is widely applied in the machine learning community, its performance is limited on data with spurious correlation or subpopulation that is introduced by hidden attributes. Existing literature…
We propose a new simple procedure called Population-Mean-Based Aggregation (PMBA) that enables a principal to "aggregate" information about an unknown state of the world from agents without understanding the information structure among…
Clinical study populations often differ meaningfully from the broader populations to which results are intended to generalize. Weighting methods such as inverse probability of sampling weights (IPSW) reweight study participants to resemble…