Related papers: Data Appraisal Without Data Sharing

Incentivizing Inclusive Contributions in Model Sharing Markets

While data plays a crucial role in training contemporary AI models, it is acknowledged that valuable public data will be exhausted in a few years, directing the world's attention towards the massive decentralized private data. However, the…

Artificial Intelligence · Computer Science 2025-05-06 Enpei Zhang , Jingyi Chai , Rui Ye , Yanfeng Wang , Siheng Chen

Revisiting Data Attribution for Influence Functions

The goal of data attribution is to trace the model's predictions through the learning algorithm and back to its training data. thereby identifying the most influential training samples and understanding how the model's behavior leads to…

Machine Learning · Computer Science 2025-08-12 Hongbo Zhu , Angelo Cangelosi

Achieving Fairness at No Utility Cost via Data Reweighing with Influence

With the fast development of algorithmic governance, fairness has become a compulsory property for machine learning models to suppress unintentional discrimination. In this paper, we focus on the pre-processing aspect for achieving…

Machine Learning · Computer Science 2022-06-20 Peizhao Li , Hongfu Liu

Scalable Data Attribution via Forward-Only Test-Time Inference

Data attribution seeks to trace model behavior back to the training examples that shaped it, enabling debugging, auditing, and data valuation at scale. Classical influence-function methods offer a principled foundation but remain…

Machine Learning · Computer Science 2025-11-26 Sibo Ma , Julian Nyarko

Training Data Attribution via Approximate Unrolled Differentiation

Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be…

Machine Learning · Computer Science 2024-05-22 Juhan Bae , Wu Lin , Jonathan Lorraine , Roger Grosse

Z0-Inf: Zeroth Order Approximation for Data Influence

A critical aspect of analyzing and improving modern machine learning systems lies in understanding how individual training examples influence a model's predictive behavior. Estimating this influence enables critical applications, including…

Machine Learning · Computer Science 2025-10-15 Narine Kokhlikyan , Kamalika Chaudhuri , Saeed Mahloujifar

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current…

Machine Learning · Computer Science 2024-06-21 Myeongseob Ko , Feiyang Kang , Weiyan Shi , Ming Jin , Zhou Yu , Ruoxi Jia

Fairness Without Harm: An Influence-Guided Active Sampling Approach

The pursuit of fairness in machine learning (ML), ensuring that the models do not exhibit biases toward protected demographic groups, typically results in a compromise scenario. This compromise can be explained by a Pareto frontier where…

Machine Learning · Computer Science 2024-11-11 Jinlong Pang , Jialu Wang , Zhaowei Zhu , Yuanshun Yao , Chen Qian , Yang Liu

Dataset Knowledge Transfer for Class-Incremental Learning without Memory

Incremental learning enables artificial agents to learn from sequential data. While important progress was made by exploiting deep neural networks, incremental learning remains very challenging. This is particularly the case when no memory…

Computer Vision and Pattern Recognition · Computer Science 2021-10-19 Habib Slim , Eden Belouadah , Adrian Popescu , Darian Onchis

MYCROFT: Towards Effective and Efficient External Data Augmentation

Machine learning (ML) models often require large amounts of data to perform well. When the available data is limited, model trainers may need to acquire more data from external sources. Often, useful data is held by private entities who are…

Machine Learning · Computer Science 2024-10-14 Zain Sarwar , Van Tran , Arjun Nitin Bhagoji , Nick Feamster , Ben Y. Zhao , Supriyo Chakraborty

Model-specific Data Subsampling with Influence Functions

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the…

Machine Learning · Computer Science 2020-10-21 Anant Raj , Cameron Musco , Lester Mackey , Nicolo Fusi

Incentivizing Time-Aware Fairness in Data Sharing

In collaborative data sharing and machine learning, multiple parties aggregate their data resources to train a machine learning model with better model performance. However, as the parties incur data collection costs, they are only willing…

Machine Learning · Computer Science 2025-10-23 Jiangwei Chen , Kieu Thao Nguyen Pham , Rachael Hwee Ling Sim , Arun Verma , Zhaoxuan Wu , Chuan-Sheng Foo , Bryan Kian Hsiang Low

Private Data Valuation and Fair Payment in Data Marketplaces

Data valuation is an essential task in a data marketplace. It aims at fairly compensating data owners for their contribution. There is increasing recognition in the machine learning community that the Shapley value -- a foundational…

Cryptography and Security · Computer Science 2023-02-20 Zhihua Tian , Jian Liu , Jingyu Li , Xinle Cao , Ruoxi Jia , Jun Kong , Mengdi Liu , Kui Ren

Fast-DataShapley: Neural Modeling for Training Data Valuation

The value and copyright of training data are crucial in the artificial intelligence industry. Service platforms should protect data providers' legitimate rights and fairly reward them for their contributions. Shapley value, a potent tool…

Machine Learning · Computer Science 2025-11-21 Haifeng Sun , Yu Xiong , Runze Wu , Xinyu Cai , Changjie Fan , Lan Zhang , Xiang-Yang Li

Understanding Data Influence with Differential Approximation

Data plays a pivotal role in the groundbreaking advancements in artificial intelligence. The quantitative analysis of data significantly contributes to model training, enhancing both the efficiency and quality of data utilization. However,…

Machine Learning · Computer Science 2025-08-21 Haoru Tan , Sitong Wu , Xiuzhe Wu , Wang Wang , Bo Zhao , Zeke Xie , Gui-Song Xia , Xiaojuan Qi

LIA: Privacy-Preserving Data Quality Evaluation in Federated Learning Using a Lazy Influence Approximation

In Federated Learning, it is crucial to handle low-quality, corrupted, or malicious data. However, traditional data valuation methods are not suitable due to privacy concerns. To address this, we propose a simple yet effective approach that…

Cryptography and Security · Computer Science 2024-11-27 Ljubomir Rokvic , Panayiotis Danassis , Sai Praneeth Karimireddy , Boi Faltings

How to Learn from Others: Transfer Machine Learning with Additive Regression Models to Improve Sales Forecasting

In a variety of business situations, the introduction or improvement of machine learning approaches is impaired as these cannot draw on existing analytical models. However, in many cases similar problems may have already been solved…

Machine Learning · Computer Science 2020-05-22 Robin Hirt , Niklas Kühl , Yusuf Peker , Gerhard Satzger

Training Fair Models in Federated Learning without Data Privacy Infringement

Training fair machine learning models becomes more and more important. As many powerful models are trained by collaboration among multiple parties, each holding some sensitive data, it is natural to explore the feasibility of training fair…

Machine Learning · Computer Science 2024-11-05 Xin Che , Jingdi Hu , Zirui Zhou , Yong Zhang , Lingyang Chu

PrivaDE: Privacy-preserving Data Evaluation for Blockchain-based Data Marketplaces

Evaluating the usefulness of data before purchase is essential when obtaining data for high-quality machine learning models, yet both model builders and data providers are often unwilling to reveal their proprietary assets. We present…

Cryptography and Security · Computer Science 2026-04-21 Wan Ki Wong , Sahel Torkamani , Michele Ciampi , Rik Sarkar

Data Fine-tuning

In real-world applications, commercial off-the-shelf systems are utilized for performing automated facial analysis including face recognition, emotion recognition, and attribute prediction. However, a majority of these commercial systems…

Computer Vision and Pattern Recognition · Computer Science 2018-12-11 Saheb Chhabra , Puspita Majumdar , Mayank Vatsa , Richa Singh