Related papers: Data Distribution Valuation

Data Valuation by Fusing Global and Local Statistical Information

Data valuation has garnered increasing attention in recent years, given the critical role of high-quality data in various applications. Among diverse data valuation approaches, Shapley value-based methods are predominant due to their strong…

Machine Learning · Computer Science 2025-11-27 Xiaoling Zhou , Ou Wu , Michael K. Ng , Hao Jiang

What is the Value of Data? On Mathematical Methods for Data Quality Estimation

Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a…

Machine Learning · Computer Science 2020-05-13 Netanel Raviv , Siddharth Jain , Jehoshua Bruck

A Survey of Data Pricing for Data Marketplaces

A data marketplace is an online venue that brings data owners, data brokers, and data consumers together and facilitates commoditisation of data amongst them. Data pricing, as a key function of a data marketplace, demands quantifying the…

Computer Science and Game Theory · Computer Science 2023-03-10 Mengxiao Zhang , Fernando Beltran , Jiamou Liu

Variance reduced Shapley value estimation for trustworthy data valuation

Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios. The most widely used method is to define the data Shapley and approximate it by means of…

Machine Learning · Statistics 2023-05-23 Mengmeng Wu , Ruoxi Jia , Changle Lin , Wei Huang , Xiangyu Chang

Do Data Valuations Make Good Data Prices?

As large language models increasingly rely on external data sources, compensating data contributors has become a central concern. But how should these payments be devised? We revisit data valuations from a $\textit{market-design…

Computer Science and Game Theory · Computer Science 2025-09-29 Dongyang Fan , Tyler J. Rotello , Sai Praneeth Karimireddy

Data Markets to support AI for All: Pricing, Valuation and Governance

We discuss a data market technique based on intrinsic (relevance and uniqueness) as well as extrinsic value (influenced by supply and demand) of data. For intrinsic value, we explain how to perform valuation of data in absolute terms (i.e…

Computers and Society · Computer Science 2019-05-17 Ramesh Raskar , Praneeth Vepakomma , Tristan Swedish , Aalekh Sharan

Reframing Data Value for Large Language Models Through the Lens of Plausibility

Data valuation seeks to answer the important question, "How much is this data worth?" Existing data valuation methods have largely focused on discriminative models, primarily examining data value through the lens of its utility in training.…

Machine Learning · Computer Science 2024-10-17 Mohamad Rida Rammal , Ruida Zhou , Suhas Diggavi

Is Data Valuation Learnable and Interpretable?

Measuring the value of individual samples is critical for many data-driven tasks, e.g., the training of a deep learning model. Recent literature witnesses the substantial efforts in developing data valuation methods. The primary data…

Machine Learning · Computer Science 2024-06-06 Ou Wu , Weiyao Zhu , Mengyang Li

Data Valuation with Gradient Similarity

High-quality data is crucial for accurate machine learning and actionable analytics, however, mislabeled or noisy data is a common problem in many domains. Distinguishing low- from high-quality data can be challenging, often requiring…

Machine Learning · Computer Science 2024-05-15 Nathaniel J. Evans , Gordon B. Mills , Guanming Wu , Xubo Song , Shannon McWeeney

Private Data Valuation and Fair Payment in Data Marketplaces

Data valuation is an essential task in a data marketplace. It aims at fairly compensating data owners for their contribution. There is increasing recognition in the machine learning community that the Shapley value -- a foundational…

Cryptography and Security · Computer Science 2023-02-20 Zhihua Tian , Jian Liu , Jingyu Li , Xinle Cao , Ruoxi Jia , Jun Kong , Mengdi Liu , Kui Ren

Computing the Value of Data: Towards Applied Data Minimalism

We present an approach to compute the monetary value of individual data points, in context of an automated decision system. The proposed method enables us to explore and implement a paradigm of data minimalism for large-scale machine…

Machine Learning · Computer Science 2019-07-30 Michaela Regneri , Julia S. Georgi , Jurij Kost , Niklas Pietsch , Sabine Stamm

Computing a Data Dividend

Quality data is a fundamental contributor to success in statistics and machine learning. If a statistical assessment or machine learning leads to decisions that create value, data contributors may want a share of that value. This paper…

Computer Science and Game Theory · Computer Science 2019-06-28 Eric Bax

A case for data valuation transparency via DValCards

Following the rise in popularity of data-centric machine learning (ML), various data valuation methods have been proposed to quantify the contribution of each datapoint to desired ML model performance metrics (e.g., accuracy). Beyond the…

Machine Learning · Computer Science 2025-07-31 Keziah Naggita , Julienne LaChance

A Survey on Data Pricing: from Economics to Data Science

Data are invaluable. How can we assess the value of data objectively, systematically and quantitatively? Pricing data, or information goods in general, has been studied and practiced in dispersed areas and principles, such as economics,…

Theoretical Economics · Economics 2021-01-01 Jian Pei

Data Distribution Valuation Using Generalized Bayesian Inference

We investigate the data distribution valuation problem, which aims to quantify the values of data distributions from their samples. This is a recently proposed problem that is related to but different from classical data valuation and can…

Machine Learning · Computer Science 2026-04-08 Cuong N. Nguyen , Cuong V. Nguyen

Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

We study two-sample variable selection: identifying variables that discriminate between the distributions of two sets of data vectors. Such variables help scientists understand the mechanisms behind dataset discrepancies. Although…

Machine Learning · Statistics 2025-11-06 Kensuke Mitsuzawa , Motonobu Kanagawa , Stefano Bortoli , Margherita Grossi , Paolo Papotti

Classification of datasets with imputed missing values: does imputation quality matter?

Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods,…

Machine Learning · Computer Science 2023-12-20 Tolou Shadbahr , Michael Roberts , Jan Stanczuk , Julian Gilbey , Philip Teare , Sören Dittmer , Matthew Thorpe , Ramon Vinas Torne , Evis Sala , Pietro Lio , Mishal Patel , AIX-COVNET Collaboration , James H. F. Rudd , Tuomas Mirtti , Antti Rannikko , John A. D. Aston , Jing Tang , Carola-Bibiane Schönlieb

What is my data worth? From data properties to data value

Data today fuels both the economy and advances in machine learning and AI. All aspects of decision making, at the personal and enterprise level and in governments are increasingly data-driven. In this context, however, there are still some…

Computers and Society · Computer Science 2018-11-13 Kalapriya Kannan , Rema Ananthanarayanan , Sameep Mehta

Data Validation

Data validation is the activity where one decides whether or not a particular data set is fit for a given purpose. Formalizing the requirements that drive this decision process allows for unambiguous communication of the requirements,…

Databases · Computer Science 2020-12-23 Mark P. J. van der Loo , Edwin de Jonge

Recognizing Variables from their Data via Deep Embeddings of Distributions

A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be…

Machine Learning · Computer Science 2019-09-12 Jonas Mueller , Alex Smola