Related papers: LAVA: Data Valuation without Pre-Specified Learnin…

SAVA: Scalable Learning-Agnostic Data Valuation

Selecting data for training machine learning models is crucial since large, web-scraped, real datasets contain noisy artifacts that affect the quality and relevance of individual data points. These noisy artifacts will impact model…

Machine Learning · Computer Science 2025-03-20 Samuel Kessler , Tam Le , Vu Nguyen

Error Estimate and Convergence Analysis for Data Valuation

Data valuation quantifies data importance, but existing methods cannot ensure validity in a single training process. The neural dynamic data valuation (NDDV) method [3] addresses this limitation. Based on NDDV, we are the first to explore…

Machine Learning · Computer Science 2025-12-19 Zhangyong Liang , Huanhuan Gao , Ji Zhang

DIVA: Dataset Derivative of a Learning Task

We present a method to compute the derivative of a learning task with respect to a dataset. A learning task is a function from a training set to the validation error, which can be represented by a trained deep neural network (DNN). The…

Machine Learning · Computer Science 2021-11-19 Yonatan Dukler , Alessandro Achille , Giovanni Paolini , Avinash Ravichandran , Marzia Polito , Stefano Soatto

Data Valuation and Detections in Federated Learning

Federated Learning (FL) enables collaborative model training while preserving the privacy of raw data. A challenge in this framework is the fair and efficient valuation of data, which is crucial for incentivizing clients to contribute…

Machine Learning · Computer Science 2024-05-10 Wenqian Li , Shuran Fu , Fengrui Zhang , Yan Pang

Finding High-Value Training Data Subset through Differentiable Convex Programming

Finding valuable training data points for deep neural networks has been a core research challenge with many applications. In recent years, various techniques for calculating the "value" of individual training datapoints have been proposed…

Machine Learning · Computer Science 2021-04-29 Soumi Das , Arshdeep Singh , Saptarshi Chatterjee , Suparna Bhattacharya , Sourangshu Bhattacharya

Data Valuation from Data-Driven Optimization

With the ongoing investment in data collection and communication technology in power systems, data-driven optimization has been established as a powerful tool for system operators to handle stochastic system states caused by weather- and…

Optimization and Control · Mathematics 2023-12-18 Robert Mieth , Juan M. Morales , H. Vincent Poor

Fairness-Aware Data Valuation for Supervised Learning

Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how…

Machine Learning · Computer Science 2023-03-31 José Pombal , Pedro Saleiro , Mário A. T. Figueiredo , Pedro Bizarro

Statistical Learning with Conditional Value at Risk

We propose a risk-averse statistical learning framework wherein the performance of a learning algorithm is evaluated by the conditional value-at-risk (CVaR) of losses rather than the expected loss. We devise algorithms based on stochastic…

Machine Learning · Computer Science 2020-02-17 Tasuku Soma , Yuichi Yoshida

DeRDaVa: Deletion-Robust Data Valuation for Machine Learning

Data valuation is concerned with determining a fair valuation of data from data sources to compensate them or to identify training examples that are the most or least useful for predictions. With the rising interest in personal data…

Machine Learning · Computer Science 2024-01-23 Xiao Tian , Rachael Hwee Ling Sim , Jue Fan , Bryan Kian Hsiang Low

A Bi-level Nonlinear Eigenvector Algorithm for Wasserstein Discriminant Analysis

Much like the classical Fisher linear discriminant analysis (LDA), the recently proposed Wasserstein discriminant analysis (WDA) is a linear dimensionality reduction method that seeks a projection matrix to maximize the dispersion of…

Machine Learning · Statistics 2023-07-31 Dong Min Roh , Zhaojun Bai , Ren-Cang Li

An Energy-Based Self-Adaptive Learning Rate for Stochastic Gradient Descent: Enhancing Unconstrained Optimization with VAV method

Optimizing the learning rate remains a critical challenge in machine learning, essential for achieving model stability and efficient convergence. The Vector Auxiliary Variable (VAV) algorithm introduces a novel energy-based self-adjustable…

Machine Learning · Computer Science 2024-11-12 Jiahao Zhang , Christian Moya , Guang Lin

Challenges in Enabling Private Data Valuation

Data valuation methods quantify how individual training examples contribute to a model's behavior, and are increasingly used for dataset curation, auditing, and emerging data markets. As these techniques become operational, they raise…

Cryptography and Security · Computer Science 2026-03-03 Yiwei Fu , Tianhao Wang , Varun Chandrasekaran

Differentiating the Value Function by using Convex Duality

We consider the differentiation of the value function for parametric optimization problems. Such problems are ubiquitous in Machine Learning applications such as structured support vector machines, matrix factorization and min-min or…

Optimization and Control · Mathematics 2020-12-29 Sheheryar Mehmood , Peter Ochs

Data Valuation using Reinforcement Learning

Quantifying the value of data is a fundamental problem in machine learning. Data valuation has multiple important use cases: (1) building insights about the learning task, (2) domain adaptation, (3) corrupted sample discovery, and (4)…

Machine Learning · Computer Science 2019-09-27 Jinsung Yoon , Sercan O. Arik , Tomas Pfister

Learning with Differentially Private (Sliced) Wasserstein Gradients

In this work, we introduce a novel framework for privately optimizing objectives that rely on Wasserstein distances between data-dependent empirical measures. Our main theoretical contribution is, based on an explicit formulation of the…

Machine Learning · Computer Science 2025-05-22 David Rodríguez-Vítores , Clément Lalanne , Jean-Michel Loubes

DAVA: Disentangling Adversarial Variational Autoencoder

The use of well-disentangled representations offers many advantages for downstream tasks, e.g. an increased sample efficiency, or better interpretability. However, the quality of disentangled interpretations is often highly dependent on the…

Machine Learning · Computer Science 2023-03-03 Benjamin Estermann , Roger Wattenhofer

Neural Dynamic Data Valuation: A Stochastic Optimal Control Approach

Data valuation has become a cornerstone of the modern data economy, where datasets function as tradable intellectual assets that drive decision-making, model training, and market transactions. Despite substantial progress, existing…

Machine Learning · Statistics 2025-12-25 Zhangyong Liang , Ji Zhang , Xin Wang , Pengfei Zhang , Zhao Li

Meta-Value Learning: a General Framework for Learning with Learning Awareness

Gradient-based learning in multi-agent systems is difficult because the gradient derives from a first-order model which does not account for the interaction between agents' learning processes. LOLA (arXiv:1709.04326) accounts for this by…

Machine Learning · Computer Science 2023-12-12 Tim Cooijmans , Milad Aghajohari , Aaron Courville

Multicategory vertex discriminant analysis for high-dimensional data

In response to the challenges of data mining, discriminant analysis continues to evolve as a vital branch of statistics. Our recently introduced method of vertex discriminant analysis (VDA) is ideally suited to handle multiple categories…

Applications · Statistics 2011-01-06 Tong Tong Wu , Kenneth Lange

DELTA: Variational Disentangled Learning for Privacy-Preserving Data Reprogramming

In real-world applications, domain data often contains identifiable or sensitive attributes, is subject to strict regulations (e.g., HIPAA, GDPR), and requires explicit data feature engineering for interpretability and transparency.…

Machine Learning · Computer Science 2025-09-03 Arun Vignesh Malarkkan , Haoyue Bai , Anjali Kaushik , Yanjie Fu