Related papers: Target Variable Engineering

The When and How of Target Variable Transformations

The machine learning pipeline typically involves the iterative process of (1) collecting the data, (2) preparing the data, (3) learning a model, and (4) evaluating a model. Practitioners recognize the importance of the data preparation…

Machine Learning · Computer Science 2025-04-30 Loren Nuyts , Jesse Davis

AutoML: Exploration v.s. Exploitation

Building a machine learning (ML) pipeline in an automated way is a crucial and complex task as it is constrained with the available time budget and resources. This encouraged the research community to introduce several solutions to utilize…

Machine Learning · Computer Science 2020-01-01 Hassan Eldeeb , Abdelrhman Eldallal

Multi-Target Regression via Random Linear Target Combinations

Multi-target regression is concerned with the simultaneous prediction of multiple continuous target variables based on the same set of input variables. It arises in several interesting industrial and environmental application domains, such…

Machine Learning · Computer Science 2015-05-05 Grigorios Tsoumakas , Eleftherios Spyromitros-Xioufis , Aikaterini Vrekou , Ioannis Vlahavas

Personalizing Performance Regression Models to Black-Box Optimization Problems

Accurately predicting the performance of different optimization algorithms for previously unseen problem instances is crucial for high-performing algorithm selection and configuration techniques. In the context of numerical optimization,…

Neural and Evolutionary Computing · Computer Science 2021-04-23 Tome Eftimov , Anja Jankovic , Gorjan Popovski , Carola Doerr , Peter Korošec

Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features

Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical…

Machine Learning · Statistics 2022-03-07 Florian Pargent , Florian Pfisterer , Janek Thomas , Bernd Bischl

RankML: a Meta Learning-Based Approach for Pre-Ranking Machine Learning Pipelines

The explosion of digital data has created multiple opportunities for organizations and individuals to leverage machine learning (ML) to transform the way they operate. However, the shortage of experts in the field of machine learning --…

Machine Learning · Computer Science 2019-11-21 Doron Laadan , Roman Vainshtein , Yarden Curiel , Gilad Katz , Lior Rokach

Evaluating software defect prediction performance: an updated benchmarking study

Accurately predicting faulty software units helps practitioners target faulty units and prioritize their efforts to maintain software quality. Prior studies use machine-learning models to detect faulty software code. We revisit past studies…

Software Engineering · Computer Science 2019-01-08 Libo Li , Stefan Lessmann , Bart Baesens

Word vectors and Language Models (LMs) pretrained on a large amount of unlabelled data can dramatically improve various Natural Language Processing (NLP) tasks. However, the measure and impact of similarity between pretraining data and…

Computation and Language · Computer Science 2019-05-20 Xiang Dai , Sarvnaz Karimi , Ben Hachey , Cecile Paris

Probabilities-Informed Machine Learning

Machine learning (ML) has emerged as a powerful tool for tackling complex regression and classification tasks, yet its success often hinges on the quality of training data. This study introduces an ML paradigm inspired by domain knowledge…

Machine Learning · Computer Science 2025-01-10 Mohsen Rashki

How Much Data Analytics is Enough? The ROI of Machine Learning Classification and its Application to Requirements Dependency Classification

Machine Learning (ML) can substantially improve the efficiency and effectiveness of organizations and is widely used for different purposes within Software Engineering. However, the selection and implementation of ML techniques rely almost…

Software Engineering · Computer Science 2021-09-30 Gouri Deshpande , Guenther Ruhe , Chad Saunders

Investigating the Impact of Data Selection Strategies on Language Model Performance

Data selection is critical for enhancing the performance of language models, particularly when aligning training datasets with a desired target distribution. This study explores the effects of different data selection methods and feature…

Computation and Language · Computer Science 2025-01-08 Jiayao Gu , Liting Chen , Yihong Li

Accounting for Variance in Machine Learning Benchmarks

Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter…

Machine Learning · Computer Science 2021-03-05 Xavier Bouthillier , Pierre Delaunay , Mirko Bronzi , Assya Trofimov , Brennan Nichyporuk , Justin Szeto , Naz Sepah , Edward Raff , Kanika Madan , Vikram Voleti , Samira Ebrahimi Kahou , Vincent Michalski , Dmitriy Serdyuk , Tal Arbel , Chris Pal , Gaël Varoquaux , Pascal Vincent

MLPerf Training Benchmark

Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML. But ML training presents three unique benchmarking challenges…

Machine Learning · Computer Science 2020-03-03 Peter Mattson , Christine Cheng , Cody Coleman , Greg Diamos , Paulius Micikevicius , David Patterson , Hanlin Tang , Gu-Yeon Wei , Peter Bailis , Victor Bittorf , David Brooks , Dehao Chen , Debojyoti Dutta , Udit Gupta , Kim Hazelwood , Andrew Hock , Xinyuan Huang , Atsushi Ike , Bill Jia , Daniel Kang , David Kanter , Naveen Kumar , Jeffery Liao , Guokai Ma , Deepak Narayanan , Tayo Oguntebi , Gennady Pekhimenko , Lillian Pentecost , Vijay Janapa Reddi , Taylor Robie , Tom St. John , Tsuguchika Tabaru , Carole-Jean Wu , Lingjie Xu , Masafumi Yamazaki , Cliff Young , Matei Zaharia

Language Models Improve When Pretraining Data Matches Target Tasks

Every data selection method inherently has a target. In practice, these targets often emerge implicitly through benchmark-driven iteration: researchers develop selection strategies, train models, measure benchmark performance, then refine…

Computation and Language · Computer Science 2025-07-17 David Mizrahi , Anders Boesen Lindbo Larsen , Jesse Allardice , Suzie Petryk , Yuri Gorokhov , Jeffrey Li , Alex Fang , Josh Gardner , Tom Gunter , Afshin Dehghan

A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data

Tabular data is prevalent in real-world machine learning applications, and new models for supervised learning of tabular data are frequently proposed. Comparative studies assessing the performance of models typically consist of…

Machine Learning · Computer Science 2024-12-19 Andrej Tschalzev , Sascha Marton , Stefan Lüdtke , Christian Bartelt , Heiner Stuckenschmidt

The Effects of Hyperparameters on SGD Training of Neural Networks

The performance of neural network classifiers is determined by a number of hyperparameters, including learning rate, batch size, and depth. A number of attempts have been made to explore these parameters in the literature, and at times, to…

Neural and Evolutionary Computing · Computer Science 2015-08-13 Thomas M. Breuel

Competitive Machine Learning: Best Theoretical Prediction vs Optimization

Machine learning is often used in competitive scenarios: Participants learn and fit static models, and those models compete in a shared platform. The common assumption is that in order to win a competition one has to have the best…

Machine Learning · Computer Science 2018-03-14 Amin Khajehnejad , Shima Hajimirza

Exploring Opportunistic Meta-knowledge to Reduce Search Spaces for Automated Machine Learning

Machine learning (ML) pipeline composition and optimisation have been studied to seek multi-stage ML models, i.e. preprocessor-inclusive, that are both valid and well-performing. These processes typically require the design and traversal of…

Machine Learning · Computer Science 2021-05-04 Tien-Dung Nguyen , David Jacob Kedziora , Katarzyna Musial , Bogdan Gabrys

Preprocessor Selection for Machine Learning Pipelines

Much of the work in metalearning has focused on classifier selection, combined more recently with hyperparameter optimization, with little concern for data preprocessing. Yet, it is generally well accepted that machine learning applications…

Machine Learning · Computer Science 2018-10-24 Brandon Schoenfeld , Christophe Giraud-Carrier , Mason Poggemann , Jarom Christensen , Kevin Seppi

Leveraging Uncertainty Estimates To Improve Classifier Performance

Binary classification involves predicting the label of an instance based on whether the model score for the positive class exceeds a threshold chosen based on the application requirements (e.g., maximizing recall for a precision bound).…

Machine Learning · Computer Science 2023-11-21 Gundeep Arora , Srujana Merugu , Anoop Saladi , Rajeev Rastogi