English
Related papers

Related papers: Quantifying Inherent Randomness in Machine Learnin…

200 papers

Statistical measures for group fairness in machine learning reflect the gap in performance of algorithms across different groups. These measures, however, exhibit a high variance between different training instances, which makes them…

Machine Learning · Computer Science 2023-07-11 Prakhar Ganesh , Hongyan Chang , Martin Strobel , Reza Shokri

Software quality assurance activities become increasingly difficult as software systems become more and more complex and continuously grow in size. Moreover, testing becomes even more expensive when dealing with large-scale systems. Thus,…

Software Engineering · Computer Science 2023-10-27 Xhulja Shahini , Domenic Bubel , Andreas Metzger

Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter…

Hyper-parameters (HPs) are an important part of machine learning (ML) model development and can greatly influence performance. This paper studies their behavior for three algorithms: Extreme Gradient Boosting (XGB), Random Forest (RF), and…

Machine Learning · Computer Science 2022-11-17 Anwesha Bhattacharyya , Joel Vaughan , Vijayan N. Nair

This research addresses the critical lack of comprehensive studies on feature scaling by systematically evaluating 12 scaling techniques - including several less common transformations - across 14 different Machine Learning algorithms and…

Random forest (RF) stands out as a highly favored machine learning approach for classification problems. The effectiveness of RF hinges on two key factors: the accuracy of individual trees and the diversity among them. In this study, we…

Machine Learning · Computer Science 2024-10-28 Ye-eun Kim , Seoung Yun Kim , Hyunjoong Kim

A common assumption in machine learning is that samples are independently and identically distributed (i.i.d). However, the contributions of different samples are not identical in training. Some samples are difficult to learn and some…

Machine Learning · Computer Science 2021-11-23 Ou Wu , Weiyao Zhu , Yingjun Deng , Haixiang Zhang , Qinghu Hou

Large language models (LLMs) exhibit cognitive biases -- systematic tendencies of irrational decision-making, similar to those seen in humans. Prior work has found that these biases vary across models and can be amplified by instruction…

Computation and Language · Computer Science 2025-07-15 Itay Itzhak , Yonatan Belinkov , Gabriel Stanovsky

Most machine learning methods assume that the input data distribution is the same in the training and testing phases. However, in practice, this stationarity is usually not met and the distribution of inputs differs, leading to unexpected…

Machine Learning · Computer Science 2023-04-19 Firas Bayram , Bestoun S. Ahmed

Reliability of machine learning evaluation -- the consistency of observed evaluation scores across replicated model training runs -- is affected by several sources of nondeterminism which can be regarded as measurement noise. Current…

Machine Learning · Computer Science 2023-10-10 Michael Hagmann , Philipp Meier , Stefan Riezler

Machine learning (ML) has been widely used in the literature to automate software engineering tasks. However, ML outcomes may be sensitive to randomization in data sampling mechanisms and learning procedures. To understand whether and how…

Software Engineering · Computer Science 2020-12-16 Cynthia C. S. Liem , Annibale Panichella

This paper compares the performances of three supervised machine learning algorithms in terms of predictive ability and model interpretation on structured or tabular data. The algorithms considered were scikit-learn implementations of…

Machine Learning · Statistics 2022-05-06 Alice J. Liu , Arpita Mukherjee , Linwei Hu , Jie Chen , Vijayan N. Nair

Data analysis and machine learning have become an integrative part of the modern scientific methodology, offering automated procedures for the prediction of a phenomenon based on past observations, unraveling underlying patterns in data and…

Machine Learning · Statistics 2015-06-04 Gilles Louppe

Machine learning (ML) algorithms become increasingly important in the analysis of astronomical data. However, since most ML algorithms are not designed to take data uncertainties into account, ML based studies are mostly restricted to data…

Instrumentation and Methods for Astrophysics · Physics 2018-12-26 Itamar Reis , Dalya Baron , Sahar Shahaf

Partitioning a set of elements into an unknown number of mutually exclusive subsets is essential in many machine learning problems. However, assigning elements, such as samples in a dataset or neurons in a network layer, to an unknown and…

Machine Learning · Computer Science 2023-11-10 Thomas M. Sutter , Alain Ryser , Joram Liebeskind , Julia E. Vogt

Even though a train/test split of the dataset randomly performed is a common practice, could not always be the best approach for estimating performance generalization under some scenarios. The fact is that the usual machine learning…

Machine Learning · Computer Science 2022-09-09 Carlos Catania , Jorge Guerra , Juan Manuel Romero , Gabriel Caffaratti , Martin Marchetta

We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying…

Machine Learning · Computer Science 2018-03-14 Boris Sharchilev , Yury Ustinovsky , Pavel Serdyukov , Maarten de Rijke

Nondeterminism in neural network optimization produces uncertainty in performance, making small improvements difficult to discern from run-to-run variability. While uncertainty can be reduced by training multiple model copies, doing so is…

Machine Learning · Computer Science 2021-07-13 Cecilia Summers , Michael J. Dinneen

One of the distinguishing characteristics of modern deep learning systems is that they typically employ neural network architectures that utilize enormous numbers of parameters, often in the millions and sometimes even in the billions.…

Machine Learning · Statistics 2021-11-15 Ben Adlam , Jake Levinson , Jeffrey Pennington

Refactoring is the process of changing the internal structure of software to improve its quality without modifying its external behavior. Empirical studies have repeatedly shown that refactoring has a positive impact on the…

Software Engineering · Computer Science 2020-09-14 Maurício Aniche , Erick Maziero , Rafael Durelli , Vinicius Durelli
‹ Prev 1 2 3 10 Next ›