Related papers: LAVA: Data Valuation without Pre-Specified Learnin…
Selecting data for training machine learning models is crucial since large, web-scraped, real datasets contain noisy artifacts that affect the quality and relevance of individual data points. These noisy artifacts will impact model…
Data valuation quantifies data importance, but existing methods cannot ensure validity in a single training process. The neural dynamic data valuation (NDDV) method [3] addresses this limitation. Based on NDDV, we are the first to explore…
We present a method to compute the derivative of a learning task with respect to a dataset. A learning task is a function from a training set to the validation error, which can be represented by a trained deep neural network (DNN). The…
Federated Learning (FL) enables collaborative model training while preserving the privacy of raw data. A challenge in this framework is the fair and efficient valuation of data, which is crucial for incentivizing clients to contribute…
Finding valuable training data points for deep neural networks has been a core research challenge with many applications. In recent years, various techniques for calculating the "value" of individual training datapoints have been proposed…
With the ongoing investment in data collection and communication technology in power systems, data-driven optimization has been established as a powerful tool for system operators to handle stochastic system states caused by weather- and…
Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how…
We propose a risk-averse statistical learning framework wherein the performance of a learning algorithm is evaluated by the conditional value-at-risk (CVaR) of losses rather than the expected loss. We devise algorithms based on stochastic…
Data valuation is concerned with determining a fair valuation of data from data sources to compensate them or to identify training examples that are the most or least useful for predictions. With the rising interest in personal data…
Much like the classical Fisher linear discriminant analysis (LDA), the recently proposed Wasserstein discriminant analysis (WDA) is a linear dimensionality reduction method that seeks a projection matrix to maximize the dispersion of…
Optimizing the learning rate remains a critical challenge in machine learning, essential for achieving model stability and efficient convergence. The Vector Auxiliary Variable (VAV) algorithm introduces a novel energy-based self-adjustable…
Data valuation methods quantify how individual training examples contribute to a model's behavior, and are increasingly used for dataset curation, auditing, and emerging data markets. As these techniques become operational, they raise…
We consider the differentiation of the value function for parametric optimization problems. Such problems are ubiquitous in Machine Learning applications such as structured support vector machines, matrix factorization and min-min or…
Quantifying the value of data is a fundamental problem in machine learning. Data valuation has multiple important use cases: (1) building insights about the learning task, (2) domain adaptation, (3) corrupted sample discovery, and (4)…
In this work, we introduce a novel framework for privately optimizing objectives that rely on Wasserstein distances between data-dependent empirical measures. Our main theoretical contribution is, based on an explicit formulation of the…
The use of well-disentangled representations offers many advantages for downstream tasks, e.g. an increased sample efficiency, or better interpretability. However, the quality of disentangled interpretations is often highly dependent on the…
Data valuation has become a cornerstone of the modern data economy, where datasets function as tradable intellectual assets that drive decision-making, model training, and market transactions. Despite substantial progress, existing…
Gradient-based learning in multi-agent systems is difficult because the gradient derives from a first-order model which does not account for the interaction between agents' learning processes. LOLA (arXiv:1709.04326) accounts for this by…
In response to the challenges of data mining, discriminant analysis continues to evolve as a vital branch of statistics. Our recently introduced method of vertex discriminant analysis (VDA) is ideally suited to handle multiple categories…
In real-world applications, domain data often contains identifiable or sensitive attributes, is subject to strict regulations (e.g., HIPAA, GDPR), and requires explicit data feature engineering for interpretability and transparency.…