Related papers: BugDoc: Algorithms to Debug Computational Processe…

Debugging Machine Learning Pipelines

Machine learning tasks entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous or uninformative outputs, the pipeline may fail or produce…

Machine Learning · Computer Science 2020-02-13 Raoni Lourenço , Juliana Freire , Dennis Shasha

On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems

We study the problem of troubleshooting machine learning systems that rely on analytical pipelines of distinct components. Understanding and fixing errors that arise in such integrative systems is difficult as failures can occur at multiple…

Machine Learning · Computer Science 2016-11-28 Besmira Nushi , Ece Kamar , Eric Horvitz , Donald Kossmann

Automatically Debugging AutoML Pipelines using Maro: ML Automated Remediation Oracle (Extended Version)

Machine learning in practice often involves complex pipelines for data cleansing, feature engineering, preprocessing, and prediction. These pipelines are composed of operators, which have to be correctly connected and whose hyperparameters…

Software Engineering · Computer Science 2023-10-03 Julian Dolby , Jason Tsay , Martin Hirzel

Pipeline Provenance for Analysis, Evaluation, Trust or Reproducibility

Data volumes and rates of research infrastructures will continue to increase in the upcoming years and impact how we interact with their final data products. Little of the processed data can be directly investigated and most of it will be…

Instrumentation and Methods for Astrophysics · Physics 2024-04-23 Michael A. C. Johnson , Hans-Rainer Klöckner , Albina Muzafarova , Kristen Lackeos , David J. Champion , Marta Dembska , Sirko Schindler , Marcus Paradies

Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering

Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and…

Methodology · Statistics 2018-09-28 Soudabeh Barghi , Lalet Scaria , Ali Salari , Tristan Glatard

Optimizing Latency and Reliability of Pipeline Workflow Applications

Mapping applications onto heterogeneous platforms is a difficult challenge, even for simple application patterns such as pipeline graphs. The problem is even more complex when processors are subject to failure during the execution of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2008-03-26 Anne Benoit , Veronika Rehn-Sonigo , Yves Robert

File-based localization of numerical perturbations in data analysis pipelines

Data analysis pipelines are known to be impacted by computational conditions, presumably due to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise…

Quantitative Methods · Quantitative Biology 2020-09-30 Ali Salari , Gregory Kiar , Lindsay Lewis , Alan C. Evans , Tristan Glatard

Using Abduction in Markov Logic Networks for Root Cause Analysis

IT infrastructure is a crucial part in most of today's business operations. High availability and reliability, and short response times to outages are essential. Thus a high amount of tool support and automation in risk management is…

Artificial Intelligence · Computer Science 2015-11-19 Joerg Schoenfisch , Janno von Stulpnagel , Jens Ortmann , Christian Meilicke , Heiner Stuckenschmidt

Automatic Failure Explanation in CPS Models

Debugging Cyber-Physical System (CPS) models can be extremely complex. Indeed, only the detection of a failure is insuffcient to know how to correct a faulty model. Faults can propagate in time and in space producing observable…

Software Engineering · Computer Science 2020-10-14 Ezio Bartocci , Niveditha Manjunath , Leonardo Mariani , Cristinel Mateis , Dejan Ničković

Oops!... I did it again. Conclusion (In-)Stability in Quantitative Empirical Software Engineering: A Large-Scale Analysis

Context: Mining software repositories is a popular means to gain insights into a software project's evolution, monitor project health, support decisions and derive best practices. Tools supporting the mining process are commonly applied by…

Software Engineering · Computer Science 2025-11-13 Nicole Hoess , Carlos Paradis , Rick Kazman , Wolfgang Mauerer

Optimum Depth of the Bounded Pipeline

The paper is devoted to studying the performance of a computational pipeline, the number of simultaneously executing stages of which at each time is bounded from above by a fixed number. A look at the restriction as a structural hazard…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-31 Ahmet A. Husainov

Progressive Data Science: Potential and Challenges

Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped-up…

Human-Computer Interaction · Computer Science 2019-09-13 Cagatay Turkay , Nicola Pezzotti , Carsten Binnig , Hendrik Strobelt , Barbara Hammer , Daniel A. Keim , Jean-Daniel Fekete , Themis Palpanas , Yunhai Wang , Florin Rusu

Scaling Systematic Literature Reviews with Machine Learning Pipelines

Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very…

Computation and Language · Computer Science 2020-10-12 Seraphina Goldfarb-Tarrant , Alexander Robertson , Jasmina Lazic , Theodora Tsouloufi , Louise Donnison , Karen Smyth

Quantifying contribution and propagation of error from computational steps, algorithms and hyperparameter choices in image classification pipelines

Data science relies on pipelines that are organized in the form of interdependent computational steps. Each step consists of various candidate algorithms that maybe used for performing a particular function. Each algorithm consists of…

Computer Vision and Pattern Recognition · Computer Science 2019-03-04 Aritra Chowdhury , Malik Magdon-Ismail , Bulent Yener

A Pipeline for Business Intelligence and Data-Driven Root Cause Analysis on Categorical Data

Business intelligence (BI) is any knowledge derived from existing data that may be strategically applied within a business. Data mining is a technique or method for extracting BI from data using statistical data modeling. Finding…

Artificial Intelligence · Computer Science 2022-11-15 Shubham Thakar , Dhananjay Kalbande

Pipelined information flow in molecular mechanical circuits leads to increased error and irreversibility

Pipelining is a design technique for logical circuits that allows for higher throughput than circuits in which multiple computations are fed through the system one after the other. It allows for much faster computation than architectures in…

Computational Physics · Physics 2024-10-28 Ian Seet , Thomas E. Ouldridge , Jonathan P. K. Doye

A Causal Research Pipeline and Tutorial for Psychologists and Social Scientists

Causality is a fundamental part of the scientific endeavour to understand the world. Unfortunately, causality is still taboo in much of psychology and social science. Motivated by a growing number of recommendations for the importance of…

Methodology · Statistics 2022-06-27 Matthew J. Vowels

Causal-Consistent Reversible Debugging: Improving CauDEr

Causal-consistent reversible debugging allows one to explore concurrent computations back and forth in order to locate the source of an error. In this setting, backward steps can be chosen freely as long as they are "causal consistent",…

Programming Languages · Computer Science 2024-06-11 Juan José González-Abril , Germán Vidal

Automated Evolutionary Approach for the Design of Composite Machine Learning Pipelines

The effectiveness of the machine learning methods for real-world tasks depends on the proper structure of the modeling pipeline. The proposed approach is aimed to automate the design of composite machine learning pipelines, which is…

Machine Learning · Computer Science 2021-09-09 Nikolay O. Nikitin , Pavel Vychuzhanin , Mikhail Sarafanov , Iana S. Polonskaia , Ilia Revin , Irina V. Barabanova , Gleb Maximov , Anna V. Kalyuzhnaya , Alexander Boukhanovsky

A Synthesis of Logical and Probabilistic Reasoning for Program Understanding and Debugging

We describe the integration of logical and uncertain reasoning methods to identify the likely source and location of software problems. To date, software engineers have had few tools for identifying the sources of error in complex software…

Artificial Intelligence · Computer Science 2013-03-08 Lisa J. Burnell , Eric J. Horvitz