Related papers: Optimizing Data Collection for Machine Learning

Data Collection and Labeling Techniques for Machine Learning

Data collection and labeling are critical bottlenecks in the deployment of machine learning applications. With the increasing complexity and diversity of applications, the need for efficient and scalable data collection and labeling…

Databases · Computer Science 2024-07-19 Qianyu Huang , Tongfang Zhao

A Data Management Approach for Dataset Selection Using Human Computation

As the number of applications that use machine learning algorithms increases, the need for labeled data useful for training such algorithms intensifies. Getting labels typically involves employing humans to do the annotation, which directly…

Machine Learning · Computer Science 2013-07-16 Alexandros Ntoulas , Omar Alonso , Vasilis Kandylas

Efficient Self-Supervised Data Collection for Offline Robot Learning

A practical approach to robot reinforcement learning is to first collect a large batch of real or simulated robot interaction data, using some data collection policy, and then learn from this data to perform various tasks, using offline…

Robotics · Computer Science 2021-06-02 Shadi Endrawis , Gal Leibovich , Guy Jacob , Gal Novik , Aviv Tamar

A Survey on Data Collection for Machine Learning: a Big Data -- AI Integration Perspective

Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more…

Machine Learning · Computer Science 2019-08-13 Yuji Roh , Geon Heo , Steven Euijong Whang

How Much More Data Do I Need? Estimating Requirements for Downstream Tasks

Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance? This question is of critical importance in applications such as autonomous driving or medical…

Computer Vision and Pattern Recognition · Computer Science 2022-07-14 Rafid Mahmood , James Lucas , David Acuna , Daiqing Li , Jonah Philion , Jose M. Alvarez , Zhiding Yu , Sanja Fidler , Marc T. Law

Efficient learning of large sets of locally optimal classification rules

Conventional rule learning algorithms aim at finding a set of simple rules, where each rule covers as many examples as possible. In this paper, we argue that the rules found in this way may not be the optimal explanations for each of the…

Machine Learning · Computer Science 2023-01-27 Van Quoc Phuong Huynh , Johannes Fürnkranz , Florian Beck

Cost optimization of data flows based on task re-ordering

Analyzing big data in a highly dynamic environment becomes more and more critical because of the increasingly need for end-to-end processing of this data. Modern data flows are quite complex and there are not efficient, cost-based,…

Databases · Computer Science 2015-07-31 Georgia Kougka , Anastasios Gounaris

Deep Learning with a Rethinking Structure for Multi-label Classification

Multi-label classification (MLC) is an important class of machine learning problems that come with a wide spectrum of applications, each demanding a possibly different evaluation criterion. When solving the MLC problems, we generally expect…

Machine Learning · Computer Science 2019-10-08 Yao-Yuan Yang , Yi-An Lin , Hong-Min Chu , Hsuan-Tien Lin

Active clustering for labeling training data

Gathering training data is a key step of any supervised learning task, and it is both critical and expensive. Critical, because the quantity and quality of the training data has a high impact on the performance of the learned function.…

Data Structures and Algorithms · Computer Science 2021-10-28 Quentin Lutz , Élie de Panafieu , Alex Scott , Maya Stein

A Framework of Sparse Online Learning and Its Applications

The amount of data in our society has been exploding in the era of big data today. In this paper, we address several open challenges of big data stream classification, including high volume, high velocity, high dimensionality, high…

Machine Learning · Computer Science 2015-07-28 Dayong Wang , Pengcheng Wu , Peilin Zhao , Steven C. H. Hoi

Delegating Data Collection in Decentralized Machine Learning

Motivated by the emergence of decentralized machine learning (ML) ecosystems, we study the delegation of data collection. Taking the field of contract theory as our starting point, we design optimal and near-optimal contracts that deal with…

Machine Learning · Computer Science 2024-11-21 Nivasini Ananthakrishnan , Stephen Bates , Michael I. Jordan , Nika Haghtalab

DCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size

Large-scale supervised classification algorithms, especially those based on deep convolutional neural networks (DCNNs), require vast amounts of training data to achieve state-of-the-art performance. Decreasing this data requirement would…

Computer Vision and Pattern Recognition · Computer Science 2016-06-15 Maya Kabkab , Azadeh Alavi , Rama Chellappa

Training Over-parameterized Models with Non-decomposable Objectives

Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints. Popular techniques for…

Machine Learning · Computer Science 2021-07-13 Harikrishna Narasimhan , Aditya Krishna Menon

Safe Data Collection for Offline and Online Policy Learning

Motivated by practical needs of experimentation and policy learning in online platforms, we study the problem of safe data collection. Specifically, our goal is to develop a logging policy that efficiently explores different actions to…

Machine Learning · Computer Science 2022-08-08 Ruihao Zhu , Branislav Kveton

Deep Super Learner: A Deep Ensemble for Classification Problems

Deep learning has become very popular for tasks such as predictive modeling and pattern recognition in handling big data. Deep learning is a powerful machine learning method that extracts lower level features and feeds them forward for the…

Machine Learning · Computer Science 2018-03-07 Steven Young , Tamer Abdou , Ayse Bener

Data Optimization in Deep Learning: A Survey

Large-scale, high-quality data are considered an essential factor for the successful application of many deep learning techniques. Meanwhile, numerous real-world deep learning tasks still have to contend with the lack of sufficient amounts…

Machine Learning · Computer Science 2023-10-26 Ou Wu , Rujing Yao

Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as…

Machine Learning · Statistics 2026-02-23 Michael O. Harding , Vikas Singh , Kirthevasan Kandasamy

Large-Scale Deep Learning Optimizations: A Comprehensive Survey

Deep learning have achieved promising results on a wide spectrum of AI applications. Larger datasets and models consistently yield better performance. However, we generally spend longer training time on more computation and communication.…

Machine Learning · Computer Science 2021-11-03 Xiaoxin He , Fuzhao Xue , Xiaozhe Ren , Yang You

Lero: A Learning-to-Rank Query Optimizer

A recent line of works apply machine learning techniques to assist or rebuild cost-based query optimizers in DBMS. While exhibiting superiority in some benchmarks, their deficiencies, e.g., unstable performance, high training cost, and slow…

Databases · Computer Science 2023-02-21 Rong Zhu , Wei Chen , Bolin Ding , Xingguang Chen , Andreas Pfadler , Ziniu Wu , Jingren Zhou

Active Data Acquisition in Autonomous Driving Simulation

Autonomous driving algorithms rely heavily on learning-based models, which require large datasets for training. However, there is often a large amount of redundant information in these datasets, while collecting and processing these…

Machine Learning · Computer Science 2023-06-27 Jianyu Lai , Zexuan Jia , Boao Li