English
Related papers

Related papers: Data Distribution Valuation

200 papers

Data valuation has garnered increasing attention in recent years, given the critical role of high-quality data in various applications. Among diverse data valuation approaches, Shapley value-based methods are predominant due to their strong…

Machine Learning · Computer Science 2025-11-27 Xiaoling Zhou , Ou Wu , Michael K. Ng , Hao Jiang

Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a…

Machine Learning · Computer Science 2020-05-13 Netanel Raviv , Siddharth Jain , Jehoshua Bruck

A data marketplace is an online venue that brings data owners, data brokers, and data consumers together and facilitates commoditisation of data amongst them. Data pricing, as a key function of a data marketplace, demands quantifying the…

Computer Science and Game Theory · Computer Science 2023-03-10 Mengxiao Zhang , Fernando Beltran , Jiamou Liu

Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios. The most widely used method is to define the data Shapley and approximate it by means of…

Machine Learning · Statistics 2023-05-23 Mengmeng Wu , Ruoxi Jia , Changle Lin , Wei Huang , Xiangyu Chang

As large language models increasingly rely on external data sources, compensating data contributors has become a central concern. But how should these payments be devised? We revisit data valuations from a $\textit{market-design…

Computer Science and Game Theory · Computer Science 2025-09-29 Dongyang Fan , Tyler J. Rotello , Sai Praneeth Karimireddy

We discuss a data market technique based on intrinsic (relevance and uniqueness) as well as extrinsic value (influenced by supply and demand) of data. For intrinsic value, we explain how to perform valuation of data in absolute terms (i.e…

Computers and Society · Computer Science 2019-05-17 Ramesh Raskar , Praneeth Vepakomma , Tristan Swedish , Aalekh Sharan

Data valuation seeks to answer the important question, "How much is this data worth?" Existing data valuation methods have largely focused on discriminative models, primarily examining data value through the lens of its utility in training.…

Machine Learning · Computer Science 2024-10-17 Mohamad Rida Rammal , Ruida Zhou , Suhas Diggavi

Measuring the value of individual samples is critical for many data-driven tasks, e.g., the training of a deep learning model. Recent literature witnesses the substantial efforts in developing data valuation methods. The primary data…

Machine Learning · Computer Science 2024-06-06 Ou Wu , Weiyao Zhu , Mengyang Li

High-quality data is crucial for accurate machine learning and actionable analytics, however, mislabeled or noisy data is a common problem in many domains. Distinguishing low- from high-quality data can be challenging, often requiring…

Machine Learning · Computer Science 2024-05-15 Nathaniel J. Evans , Gordon B. Mills , Guanming Wu , Xubo Song , Shannon McWeeney

Data valuation is an essential task in a data marketplace. It aims at fairly compensating data owners for their contribution. There is increasing recognition in the machine learning community that the Shapley value -- a foundational…

Cryptography and Security · Computer Science 2023-02-20 Zhihua Tian , Jian Liu , Jingyu Li , Xinle Cao , Ruoxi Jia , Jun Kong , Mengdi Liu , Kui Ren

We present an approach to compute the monetary value of individual data points, in context of an automated decision system. The proposed method enables us to explore and implement a paradigm of data minimalism for large-scale machine…

Machine Learning · Computer Science 2019-07-30 Michaela Regneri , Julia S. Georgi , Jurij Kost , Niklas Pietsch , Sabine Stamm

Quality data is a fundamental contributor to success in statistics and machine learning. If a statistical assessment or machine learning leads to decisions that create value, data contributors may want a share of that value. This paper…

Computer Science and Game Theory · Computer Science 2019-06-28 Eric Bax

Following the rise in popularity of data-centric machine learning (ML), various data valuation methods have been proposed to quantify the contribution of each datapoint to desired ML model performance metrics (e.g., accuracy). Beyond the…

Machine Learning · Computer Science 2025-07-31 Keziah Naggita , Julienne LaChance

Data are invaluable. How can we assess the value of data objectively, systematically and quantitatively? Pricing data, or information goods in general, has been studied and practiced in dispersed areas and principles, such as economics,…

Theoretical Economics · Economics 2021-01-01 Jian Pei

We investigate the data distribution valuation problem, which aims to quantify the values of data distributions from their samples. This is a recently proposed problem that is related to but different from classical data valuation and can…

Machine Learning · Computer Science 2026-04-08 Cuong N. Nguyen , Cuong V. Nguyen

We study two-sample variable selection: identifying variables that discriminate between the distributions of two sets of data vectors. Such variables help scientists understand the mechanisms behind dataset discrepancies. Although…

Machine Learning · Statistics 2025-11-06 Kensuke Mitsuzawa , Motonobu Kanagawa , Stefano Bortoli , Margherita Grossi , Paolo Papotti

Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods,…

Data today fuels both the economy and advances in machine learning and AI. All aspects of decision making, at the personal and enterprise level and in governments are increasingly data-driven. In this context, however, there are still some…

Computers and Society · Computer Science 2018-11-13 Kalapriya Kannan , Rema Ananthanarayanan , Sameep Mehta

Data validation is the activity where one decides whether or not a particular data set is fit for a given purpose. Formalizing the requirements that drive this decision process allows for unambiguous communication of the requirements,…

Databases · Computer Science 2020-12-23 Mark P. J. van der Loo , Edwin de Jonge

A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be…

Machine Learning · Computer Science 2019-09-12 Jonas Mueller , Alex Smola
‹ Prev 1 2 3 10 Next ›