English
Related papers

Related papers: Data Budgeting for Machine Learning

200 papers

As the number of applications that use machine learning algorithms increases, the need for labeled data useful for training such algorithms intensifies. Getting labels typically involves employing humans to do the annotation, which directly…

Machine Learning · Computer Science 2013-07-16 Alexandros Ntoulas , Omar Alonso , Vasilis Kandylas

Despite data's central role in AI production, it remains the least understood input. As AI labs exhaust public data and turn to proprietary sources, with deals reaching hundreds of millions of dollars, research across computer science,…

Computers and Society · Computer Science 2026-04-28 Hamidah Oderinwale , Anna Kazlauskas

Despite intensive efforts devoted to tool learning, the problem of budget-constrained tool learning, which focuses on resolving user queries within a specific budget constraint, has been widely overlooked. This paper proposes a novel method…

Artificial Intelligence · Computer Science 2024-06-12 Yuanhang Zheng , Peng Li , Ming Yan , Ji Zhang , Fei Huang , Yang Liu

In most practical settings and theoretical analyses, one assumes that a model can be trained until convergence. However, the growing complexity of machine learning datasets and models may violate such assumptions. Indeed, current approaches…

Computer Vision and Pattern Recognition · Computer Science 2020-07-01 Mengtian Li , Ersin Yumer , Deva Ramanan

Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance? This question is of critical importance in applications such as autonomous driving or medical…

Computer Vision and Pattern Recognition · Computer Science 2022-07-14 Rafid Mahmood , James Lucas , David Acuna , Daiqing Li , Jonah Philion , Jose M. Alvarez , Zhiding Yu , Sanja Fidler , Marc T. Law

Data selection can reduce the amount of training data needed to finetune LLMs; however, the efficacy of data selection scales directly with its compute. Motivated by the practical challenge of compute-constrained finetuning, we consider the…

Machine Learning · Computer Science 2025-04-09 Junjie Oscar Yin , Alexander M. Rush

We design mechanisms for online procurement of data held by strategic agents for machine learning tasks. The challenge is to use past data to actively price future data and give learning guarantees even when an agent's cost for revealing…

Computer Science and Game Theory · Computer Science 2015-06-09 Jacob Abernethy , Yiling Chen , Chien-Ju Ho , Bo Waggoner

High-quality machine learning models are dependent on access to high-quality training data. When the data are not already available, it is tedious and costly to obtain them. Data markets help with identifying valuable training data: model…

Machine Learning · Computer Science 2023-06-06 Boxin Zhao , Boxiang Lyu , Raul Castro Fernandez , Mladen Kolar

Modern artificial intelligence (AI) applications require large quantities of training and test data. This need creates critical challenges not only concerning the availability of such data, but also regarding its quality. For example,…

The increasing reliance on human preference feedback to judge AI-generated pseudo labels has created a pressing need for principled, budget-conscious data acquisition strategies. We address the crucial question of how to optimally allocate…

Machine Learning · Statistics 2026-02-13 Zihan Dong , Xiaotian Hou , Ruijia Wu , Linjun Zhang

Participatory Budgeting (PB) offers a democratic process for communities to allocate public funds across various projects through voting. In practice, PB organizers face challenges in selecting aggregation rules either because they are not…

Machine Learning · Computer Science 2024-12-04 Roy Fairstein , Dan Vilenchik , Kobi Gal

The determination of sample size in qualitative research has traditionally relied on the subjective and often ambiguous principle of data saturation, which can lead to inconsistencies and threaten methodological rigor. This study introduces…

Machine Learning · Computer Science 2025-12-10 Hasan Tutar , Caner Erden , Ümit Şentürk

The quality of underlying training data is very crucial for building performant machine learning models with wider generalizabilty. However, current machine learning (ML) tools lack streamlined processes for improving the data quality. So,…

Machine Learning · Computer Science 2021-12-16 Atindriyo Sanyal , Vikram Chatterji , Nidhi Vyas , Ben Epstein , Nikita Demir , Anthony Corletti

Digital data collected over the decades and data currently being produced with use of information technology is vastly the unlabeled data or data without description. The unlabeled data is relatively easy to acquire but expensive to label…

Machine Learning · Computer Science 2022-08-02 Kinyua Gikunda

Applications of machine learning in the non-profit and public sectors often feature an iterative workflow of data acquisition, prediction, and optimization of interventions. There are four major pain points that a machine learning pipeline…

Machine Learning · Computer Science 2022-01-19 Zheyuan Ryan Shi , Zhiwei Steven Wu , Rayid Ghani , Fei Fang

Conventional machine learning applications in the mobile/IoT setting transmit data to a cloud-server for predictions. Due to cost considerations (power, latency, monetary), it is desirable to minimise device-to-server transmissions. The…

Machine Learning · Computer Science 2020-04-15 Aditya Gangrade , Durmus Alp Emre Acar , Venkatesh Saligrama

Machine learning is now used in many applications thanks to its ability to predict, generate, or discover patterns from large quantities of data. However, the process of collecting and transforming data for practical use is intricate. Even…

Data corruption, including missing and noisy data, poses significant challenges in real-world machine learning. This study investigates the effects of data corruption on model performance and explores strategies to mitigate these effects…

Machine Learning · Computer Science 2025-05-22 Qi Liu , Wanjing Ma

The cost of labeling data often limits the performance of machine learning systems. In multi-task learning, related tasks provide information to each other and improve overall performance, but the label cost can vary among tasks. How should…

Machine Learning · Computer Science 2023-08-25 Ximeng Sun , Kihyuk Sohn , Kate Saenko , Clayton Mellina , Xiao Bian

When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior.…

Machine Learning · Computer Science 2024-01-24 Logan Engstrom , Axel Feldmann , Aleksander Madry
‹ Prev 1 2 3 10 Next ›