Related papers: Machine Learning on sWeighted Data
Data analysis in high energy physics often deals with data samples consisting of a mixture of signal and background events. The sPlot technique is a common method to subtract the contribution of the background by assigning weights to…
The use of machine learning approaches continues to have many benefits in experimental nuclear and particle physics. One common issue is generating training data which is sufficiently realistic to give reliable results. Here we advocate…
Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations…
Machine Unlearning is an emerging paradigm for selectively removing the impact of training datapoints from a network. Unlike existing methods that target a limited subset or a single class, our framework unlearns all classes in a single…
Abstract Machine learning models, trained on data from ab initio quantum simulations, are yielding molecular dynamics potentials with unprecedented accuracy. One limiting factor is the quantity of available training data, which can be…
Imbalanced data pose challenges for deep learning based classification models. One of the most widely-used approaches for tackling imbalanced data is re-weighting, where training samples are associated with different weights in the loss…
Power load forecast with Machine Learning is a fairly mature application of artificial intelligence and it is indispensable in operation, control and planning. Data selection techniqies have been hardly used in this application. However,…
Existing semi-supervised learning (SSL) algorithms use a single weight to balance the loss of labeled and unlabeled examples, i.e., all unlabeled examples are equally weighted. But not all unlabeled data are equal. In this paper we study…
The paper advocates the use of a statistical tool dedicated to the exploration of data samples populated by several sources of events. This new technique, called sPlot, is able to unfold the contributions of the different sources to the…
Real-world machine learning applications often have complex test metrics, and may have training and test data that are not identically distributed. Motivated by known connections between complex test metrics and cost-weighted learning, we…
A distribution shift between the training and test data can severely harm performance of machine learning models. Importance weighting addresses this issue by assigning different weights to data points during training. We argue that…
Training machine learning interatomic potentials often requires optimizing a loss function composed of three variables: potential energies, forces, and stress. The contribution of each variable to the total loss is typically weighted using…
Deploying machine learning models to new tasks is a major challenge despite the large size of the modern training datasets. However, it is conceivable that the training data can be reweighted to be more representative of the new (target)…
Current deep neural networks (DNNs) can easily overfit to biased training data with corrupted labels or class imbalance. Sample re-weighting strategy is commonly used to alleviate this issue by designing a weighting function mapping from…
Machine learning tools are commonly used in modern high energy physics (HEP) experiments. Different models, such as boosted decision trees (BDT) and artificial neural networks (ANN), are widely used in analyses and even in the software…
Machine learning models (mainly neural networks) are used more and more in real life. Users feed their data to the model for training. But these processes are often one-way. Once trained, the model remembers the data. Even when data is…
Weighting strategy prevails in machine learning. For example, a common approach in robust machine learning is to exert lower weights on samples which are likely to be noisy or quite hard. This study reveals another undiscovered strategy,…
Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) research towards solving complex problems and achieving super-human performance on many of them. Nevertheless, Q-Learning is known to be…
In the context of supervised learning of a function by a neural network, we claim and empirically verify that the neural network yields better results when the distribution of the data set focuses on regions where the function to learn is…
In scientific machine learning, models are routinely deployed with parameter values or boundary conditions far from those used in training. This paper studies the learning-where-to-learn problem of designing a training data distribution…