Related papers: Efficiently Reproducing Distributed Workflows in N…

FlowBook: Enforcing Reproducibility in Computational Notebooks

Computational notebooks are notoriously prone to reproducibility failures. By permitting out-of-order cell execution, notebooks accumulate hidden state and implicit dependencies that cause interactive executions to silently diverge from…

Programming Languages · Computer Science 2026-05-05 Stephen N. Freund , Emery D. Berger , Cormac Flanagan , Eunice Jun

Ten Simple Rules for Reproducible Research in Jupyter Notebooks

Reproducibility of computational studies is a hallmark of scientific methodology. It enables researchers to build with confidence on the methods and findings of others, reuse and extend computational pipelines, and thereby drive scientific…

Other Computer Science · Computer Science 2018-10-19 Adam Rule , Amanda Birmingham , Cristal Zuniga , Ilkay Altintas , Shih-Cheng Huang , Rob Knight , Niema Moshiri , Mai H. Nguyen , Sara Brin Rosenthal , Fernando Pérez , Peter W. Rose

Computational reproducibility of Jupyter notebooks from biomedical publications

Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows. The reproducibility of…

Digital Libraries · Computer Science 2023-08-16 Sheeba Samuel , Daniel Mietchen

Fine-Grained Lineage for Safer Notebook Interactions

Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into…

Software Engineering · Computer Science 2021-06-22 Stephen Macke , Hongpu Gong , Doris Jung-Lin Lee , Andrew Head , Doris Xin , Aditya Parameswaran

Containing the Reproducibility Gap: Automated Repository-Level Containerization for Scholarly Jupyter Notebooks

Computational reproducibility is fundamental to trustworthy science, yet remains difficult to achieve in practice across various research workflows, including Jupyter notebooks published alongside scholarly articles. Environment drift,…

Software Engineering · Computer Science 2026-04-02 Sheeba Samuel , Daniel Mietchen , Hemanta Lo , Martin Gaedke

ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells

Jupyter notebooks represent a unique format for programming - a combination of code and Markdown with rich formatting, separated into individual cells. We propose to perceive a Jupyter Notebook cell as a simplified and raw version of a…

Software Engineering · Computer Science 2022-01-03 Sergey Titov , Yaroslav Golubev , Timofey Bryksin

Large-scale Evaluation of Notebook Checkpointing with AI Agents

Saving, or checkpointing, intermediate results during interactive data exploration can potentially boost user productivity. However, existing studies on this topic are limited, as they primarily rely on small-scale experiments with human…

Human-Computer Interaction · Computer Science 2025-04-03 Hanxi Fang , Supawit Chockchowwat , Hari Sundaram , Yongjoo Park

Nf-PEAK: Process-Based Energy Attribution for Nextflow Workflows on Kubernetes Clusters

Scientific workflows are pipelines of interdependent tasks. They are increasingly executed on shared Kubernetes clusters via workflow engines such as Nextflow. Their energy consumption matters for both cost and sustainability. It is…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-22 Philipp Thamm , Somayeh Mohammadi , Kathleen West , Knut Reinert , Lauritz Thamsen , Ulf Leser

WorkflowHub: a registry for computational workflows

The rising popularity of computational workflows is driven by the need for repetitive and scalable data processing, sharing of processing know-how, and transparent methods. As both combined records of analysis and descriptions of processing…

Digital Libraries · Computer Science 2025-05-23 Ove Johan Ragnar Gustafsson , Sean R. Wilkinson , Finn Bacall , Luca Pireddu , Stian Soiland-Reyes , Simone Leo , Stuart Owen , Nick Juty , José M. Fernández , Björn Grüning , Tom Brown , Hervé Ménager , Salvador Capella-Gutierrez , Frederik Coppens , Carole Goble

Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference

Using multiple nodes and parallel computing algorithms has become a principal tool to improve training and execution times of deep neural networks as well as effective collective intelligence in sensor networks. In this paper, we consider…

Machine Learning · Computer Science 2020-08-20 Afshin Abdi , Saeed Rashidi , Faramarz Fekri , Tushar Krishna

Supporting Workflow Reproducibility by Linking Bioinformatics Tools across Papers and Executable Code

Motivation: The rapid growth of biological data has intensified the need for transparent, reproducible, and well-documented computational workflows. The ability to clearly connect the steps of a workflow in the code with their description…

Computation and Language · Computer Science 2026-03-10 Clémence Sebe , Olivier Ferret , Aurélie Névéol , Mahdi Esmailoghli , Ulf Leser , Sarah Cohen-Boulakia

An Efficient Fault Tolerant Workflow Scheduling Approach using Replication Heuristics and Checkpointing in the Cloud

Scientific workflows have been predominantly used for complex and large scale data analysis and scientific computation/automation and the need for robust workflow scheduling techniques has grown considerably. But, most of the existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-04 S. Jaya Nirmala , Amrith Rajagopal Setlur , Har Simrat Singh , Sudhanshu Khoriya

Computational reproducibility refers to obtaining consistent results when rerunning an experiment. Jupyter Notebook, a web-based computational notebook application, facilitates running, publishing, and sharing computational experiments…

Software Engineering · Computer Science 2025-09-30 A S M Shahadat Hossain , Colin Brown , David Koop , Tanu Malik

Near-Optimal Distributed Band-Joins through Recursive Partitioning

We consider running-time optimization for band-joins in a distributed system, e.g., the cloud. To balance load across worker machines, input has to be partitioned, which causes duplication. We explore how to resolve this tension between…

Databases · Computer Science 2020-04-15 Rundong Li , Wolfgang Gatterbauer , Mirek Riedewald

Jup2Kub: algorithms and a system to translate a Jupyter Notebook pipeline to a fault tolerant distributed Kubernetes deployment

Scientific workflows facilitate computational, data manipulation, and sometimes visualization steps for scientific data analysis. They are vital for reproducing and validating experiments, usually involving computational steps in scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-22 Jinli Duan , Shasha Dennis

A Framework to capture and reproduce the Absolute State of Jupyter Notebooks

Jupyter Notebooks are an enormously popular tool for creating and narrating computational research projects. They also have enormous potential for creating reproducible scientific research artifacts. Capturing the complete state of a…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-18 Dimuthu Wannipurage , Suresh Marru , Marlon Pierce

Reproducible Workflow on a Public Cloud for Computational Fluid Dynamics

In a new effort to make our research transparent and reproducible by others, we developed a workflow to run and share computational studies on the public cloud Microsoft Azure. It uses Docker containers to create an image of the application…

Computational Engineering, Finance, and Science · Computer Science 2020-07-24 Olivier Mesnard , Lorena A. Barba

Bind: a Partitioned Global Workflow Parallel Programming Model

High Performance Computing is notorious for its long and expensive software development cycle. To address this challenge, we present Bind: a "partitioned global workflow" parallel programming model for C++ applications that enables quick…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-16 Alex Kosenkov , Matthias Troyer

Productive Reproducible Workflows for DNNs: A Case Study for Industrial Defect Detection

As Deep Neural Networks (DNNs) have become an increasingly ubiquitous workload, the range of libraries and tooling available to aid in their development and deployment has grown significantly. Scalable, production quality tools are freely…

Machine Learning · Computer Science 2022-06-22 Perry Gibson , José Cano

iReplayer: In-situ and Identical Record-and-Replay for Multithreaded Applications

Reproducing executions of multithreaded programs is very challenging due to many intrinsic and external non-deterministic factors. Existing RnR systems achieve significant progress in terms of performance overhead, but none targets the…

Operating Systems · Computer Science 2018-04-05 Hongyu Liu , Sam Silvestro , Wei Wang , Chen Tian , Tongping Liu