Related papers: Debugging Machine Learning Pipelines

BugDoc: Algorithms to Debug Computational Processes

Data analysis for scientific experiments and enterprises, large-scale simulations, and machine learning tasks all entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities…

Databases · Computer Science 2020-04-15 Raoni Lourenço , Juliana Freire , Dennis Shasha

On Leakage in Machine Learning Pipelines

Machine learning (ML) provides powerful tools for predictive modeling. ML's popularity stems from the promise of sample-level prediction with applications across a variety of fields from physics and marketing to healthcare. However, if not…

Machine Learning · Computer Science 2025-08-11 Leonard Sasse , Eliana Nicolaisen-Sobesky , Juergen Dukart , Simon B. Eickhoff , Michael Götz , Sami Hamdan , Vera Komeyer , Abhijit Kulkarni , Juha Lahnakoski , Bradley C. Love , Federico Raimondo , Kaustubh R. Patil

Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles

Machine learning (ML) is an increasingly important scientific tool supporting decision making and knowledge generation in numerous fields. With this, it also becomes more and more important that the results of ML experiments are…

Machine Learning · Computer Science 2020-06-23 Sheeba Samuel , Frank Löffler , Birgitta König-Ries

A survey on bias in machine learning research

Current research on bias in machine learning often focuses on fairness, while overlooking the roots or causes of bias. However, bias was originally defined as a "systematic error," often caused by humans at different stages of the research…

Machine Learning · Computer Science 2023-08-23 Agnieszka Mikołajczyk-Bareła , Michał Grochowski

On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems

We study the problem of troubleshooting machine learning systems that rely on analytical pipelines of distinct components. Understanding and fixing errors that arise in such integrative systems is difficult as failures can occur at multiple…

Machine Learning · Computer Science 2016-11-28 Besmira Nushi , Ece Kamar , Eric Horvitz , Donald Kossmann

Automated Evolutionary Approach for the Design of Composite Machine Learning Pipelines

The effectiveness of the machine learning methods for real-world tasks depends on the proper structure of the modeling pipeline. The proposed approach is aimed to automate the design of composite machine learning pipelines, which is…

Machine Learning · Computer Science 2021-09-09 Nikolay O. Nikitin , Pavel Vychuzhanin , Mikhail Sarafanov , Iana S. Polonskaia , Ilia Revin , Irina V. Barabanova , Gleb Maximov , Anna V. Kalyuzhnaya , Alexander Boukhanovsky

Debugging Machine Learning Tasks

Unlike traditional programs (such as operating systems or word processors) which have large amounts of code, machine learning tasks use programs with relatively small amounts of code (written in machine learning libraries), but voluminous…

Machine Learning · Computer Science 2016-03-24 Aleksandar Chakarov , Aditya Nori , Sriram Rajamani , Shayak Sen , Deepak Vijaykeerthy

Building a Reproducible Machine Learning Pipeline

Reproducibility of modeling is a problem that exists for any machine learning practitioner, whether in industry or academia. The consequences of an irreproducible model can include significant financial costs, lost time, and even loss of…

Machine Learning · Computer Science 2018-10-11 Peter Sugimura , Florian Hartl

Sources of Irreproducibility in Machine Learning: A Review

Background: Many published machine learning studies are irreproducible. Issues with methodology and not properly accounting for variation introduced by the algorithm themselves or their implementations are attributed as the main…

Machine Learning · Computer Science 2023-04-17 Odd Erik Gundersen , Kevin Coakley , Christine Kirkpatrick , Yolanda Gil

Automatically Debugging AutoML Pipelines using Maro: ML Automated Remediation Oracle (Extended Version)

Machine learning in practice often involves complex pipelines for data cleansing, feature engineering, preprocessing, and prediction. These pipelines are composed of operators, which have to be correctly connected and whose hyperparameters…

Software Engineering · Computer Science 2023-10-03 Julian Dolby , Jason Tsay , Martin Hirzel

Continual learning on deployment pipelines for Machine Learning Systems

Following the development of digitization, a growing number of large Original Equipment Manufacturers (OEMs) are adapting computer vision or natural language processing in a wide range of applications such as anomaly detection and quality…

Machine Learning · Computer Science 2022-12-07 Qiang Li , Chongyu Zhang

Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities

Machine learning (ML) is now commonplace, powering data-driven applications in various organizations. Unlike the traditional perception of ML in research, ML production pipelines are complex, with many interlocking analytical components…

Databases · Computer Science 2021-03-31 Doris Xin , Hui Miao , Aditya Parameswaran , Neoklis Polyzotis

Instrumentation and Analysis of Native ML Pipelines via Logical Query Plans

Machine Learning (ML) is increasingly used to automate impactful decisions, which leads to concerns regarding their correctness, reliability, and fairness. We envision highly-automated software platforms to assist data scientists with…

Databases · Computer Science 2024-09-04 Stefan Grafberger

Making Logic Learnable With Neural Networks

While neural networks are good at learning unspecified functions from training samples, they cannot be directly implemented in hardware and are often not interpretable or formally verifiable. On the other hand, logic circuits are…

Machine Learning · Computer Science 2020-06-09 Tobias Brudermueller , Dennis L. Shung , Adrian J. Stanley , Johannes Stegmaier , Smita Krishnaswamy

Finding Reusable Machine Learning Components to Build Programming Language Processing Pipelines

Programming Language Processing (PLP) using machine learning has made vast improvements in the past few years. Increasingly more people are interested in exploring this promising field. However, it is challenging for new researchers and…

Machine Learning · Computer Science 2023-06-19 Patrick Flynn , Tristan Vanderbruggen , Chunhua Liao , Pei-Hung Lin , Murali Emani , Xipeng Shen

Problem Learning: Towards the Free Will of Machines

A machine intelligence pipeline usually consists of six components: problem, representation, model, loss, optimizer and metric. Researchers have worked hard trying to automate many components of the pipeline. However, one key component of…

Artificial Intelligence · Computer Science 2021-09-02 Yongfeng Zhang

Forensicability of Deep Neural Network Inference Pipelines

We propose methods to infer properties of the execution environment of machine learning pipelines by tracing characteristic numerical deviations in observable outputs. Results from a series of proof-of-concept experiments obtained on local…

Machine Learning · Computer Science 2021-02-19 Alexander Schlögl , Tobias Kupek , Rainer Böhme

DeepLine: AutoML Tool for Pipelines Generation using Deep Reinforcement Learning and Hierarchical Actions Filtering

Automatic machine learning (AutoML) is an area of research aimed at automating machine learning (ML) activities that currently require human experts. One of the most challenging tasks in this field is the automatic generation of end-to-end…

Machine Learning · Computer Science 2019-11-04 Yuval Heffetz , Roman Vainstein , Gilad Katz , Lior Rokach

Scaling Systematic Literature Reviews with Machine Learning Pipelines

Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very…

Computation and Language · Computer Science 2020-10-12 Seraphina Goldfarb-Tarrant , Alexander Robertson , Jasmina Lazic , Theodora Tsouloufi , Louise Donnison , Karen Smyth

Preprocessor Selection for Machine Learning Pipelines

Much of the work in metalearning has focused on classifier selection, combined more recently with hyperparameter optimization, with little concern for data preprocessing. Yet, it is generally well accepted that machine learning applications…

Machine Learning · Computer Science 2018-10-24 Brandon Schoenfeld , Christophe Giraud-Carrier , Mason Poggemann , Jarom Christensen , Kevin Seppi