Related papers: Guidelines for data analysis scripts

One DSL to Rule Them All: IDE-Assisted Code Generation for Agile Data Analysis

Data analysis is at the core of scientific studies, a prominent task that researchers and practitioners typically undertake by programming their own set of automated scripts. While there is no shortage of tools and languages available for…

Software Engineering · Computer Science 2019-04-23 Artur Andrzejak , Oliver Wenz , Diego Costa

Seven simple steps for log analysis in AI systems

AI systems produce large volumes of logs as they interact with tools and users. Analysing these logs can help understand model capabilities, propensities, and behaviours, or assess whether an evaluation worked as intended. Researchers have…

Artificial Intelligence · Computer Science 2026-04-23 Magda Dubois , Ekin Zorer , Maia Hamin , Joe Skinner , Alexandra Souly , Jerome Wynne , Harry Coppock , Lucas Sato , Sayash Kapoor , Sunishchal Dev , Keno Juchems , Kimberly Mai , Timo Flesch , Lennart Luettgau , Charles Teague , Eric Patey , JJ Allaire , Lorenzo Pacchiardi , Jose Hernandez-Orallo , Cozmin Ududec

A handy systematic method for data hazards detection in an instruction set of a pipelined microprocessor

It is intended in this document to introduce a handy systematic method for enumerating all possible data dependency cases that could occur between any two instructions that might happen to be processed at the same time at different stages…

Hardware Architecture · Computer Science 2012-03-06 Ahmed M. Mahran

Why we should respect analysis results as data

The development and approval of new treatments generates large volumes of results, such as summaries of efficacy and safety. However, it is commonly overlooked that analyzing clinical study data also produces data in the form of results.…

Computers and Society · Computer Science 2022-04-22 Joana M Barros , Lukas A Widmer , Mark Baillie , Simon Wandel

Towards "all-inclusive" Data Preparation to ensure Data Quality

Data preparation, especially data cleaning, is very important to ensure data quality and to improve the output of automated decision systems. Since there is no single tool that covers all steps required, a combination of tools -- namely a…

Databases · Computer Science 2023-08-29 Valerie Restat

Awareness of Secure Coding Guidelines in the Industry -- A first data analysis

Software needs to be secure, in particular, when deployed to critical infrastructures. Secure coding guidelines capture practices in industrial software engineering to ensure the security of code. This study aims to assess the level of…

Software Engineering · Computer Science 2021-01-07 Tiago Espinha Gasiba , Ulrike Lechner , Maria Pinto-Albuquerque , Daniel Mendez Fernandez

Using script generators for pipeline prototyping

Fully automated astronomical data calibration and imaging pipelines are difficult to develop without a good prototyping method which permits to bridge the time between observatory commissioning and the moment when the special features and…

Instrumentation and Methods for Astrophysics · Physics 2021-12-21 Dirk Petry

Applying Bayesian Analysis Guidelines to Empirical Software Engineering Data: The Case of Programming Languages and Code Quality

Statistical analysis is the tool of choice to turn data into information, and then information into empirical knowledge. To be valid, the process that goes from data to knowledge should be supported by detailed, rigorous guidelines, which…

Software Engineering · Computer Science 2024-10-03 Carlo A. Furia , Richard Torkar , Robert Feldt

Guideline2Graph: Profile-Aware Multimodal Parsing for Executable Clinical Decision Graphs

Clinical practice guidelines are long, multimodal documents whose branching recommendations are difficult to convert into executable clinical decision support (CDS), and one-shot parsing often breaks cross-page continuity. Recent LLM/VLM…

Computer Vision and Pattern Recognition · Computer Science 2026-04-06 Onur Selim Kilic , Yeti Z. Gurbuz , Cem O. Yaldiz , Afra Nawar , Etrit Haxholli , Ogul Can , Eli Waxman

Coding Guidelines for Prolog

Coding standards and good practices are fundamental to a disciplined approach to software projects, whatever programming languages they employ. Prolog programming can benefit from such an approach, perhaps more than programming in other…

Programming Languages · Computer Science 2011-05-18 Michael A. Covington , Roberto Bagnara , Richard A. O'Keefe , Jan Wielemaker , Simon Price

The Art and Practice of Data Science Pipelines: A Comprehensive Study of Data Science Pipelines In Theory, In-The-Small, and In-The-Large

Increasingly larger number of software systems today are including data science components for descriptive, predictive, and prescriptive analytics. The collection of data science stages from acquisition, to cleaning/curation, to modeling,…

Software Engineering · Computer Science 2022-02-15 Sumon Biswas , Mohammad Wardat , Hridesh Rajan

Scaling Systematic Literature Reviews with Machine Learning Pipelines

Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very…

Computation and Language · Computer Science 2020-10-12 Seraphina Goldfarb-Tarrant , Alexander Robertson , Jasmina Lazic , Theodora Tsouloufi , Louise Donnison , Karen Smyth

A Primer on the Data Cleaning Pipeline

The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this…

Databases · Computer Science 2023-07-26 Rebecca C. Steorts

Preprocessing Methods and Pipelines of Data Mining: An Overview

Data mining is about obtaining new knowledge from existing datasets. However, the data in the existing datasets can be scattered, noisy, and even incomplete. Although lots of effort is spent on developing or fine-tuning data mining models…

Machine Learning · Computer Science 2019-06-21 Canchen Li

Advances in Process Optimization: A Comprehensive Survey of Process Mining, Predictive Process Monitoring, and Process-Aware Recommender Systems

Process analytics approaches allow organizations to support the practice of Business Process Management and continuous improvement by leveraging all process-related data to extract knowledge, improve process performance and support…

Other Computer Science · Computer Science 2025-02-25 Asjad Khan , Aditya Ghose , Hoa Dam , Arsal Syed

Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering

Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and…

Methodology · Statistics 2018-09-28 Soudabeh Barghi , Lalet Scaria , Ali Salari , Tristan Glatard

Aligning AI Research with the Needs of Clinical Coding Workflows: Eight Recommendations Based on US Data Analysis and Critical Review

Clinical coding is crucial for healthcare billing and data analysis. Manual clinical coding is labour-intensive and error-prone, which has motivated research towards full automation of the process. However, our analysis, based on US English…

Computation and Language · Computer Science 2025-06-19 Yidong Gan , Maciej Rybinski , Ben Hachey , Jonathan K. Kummerfeld

Sequence-to-Sequence Models for Extracting Information from Registration and Legal Documents

A typical information extraction pipeline consists of token- or span-level classification models coupled with a series of pre- and post-processing scripts. In a production pipeline, requirements often change, with classes being added and…

Artificial Intelligence · Computer Science 2022-01-19 Ramon Pires , Fábio C. de Souza , Guilherme Rosa , Roberto A. Lotufo , Rodrigo Nogueira

A sensitivity analysis to quantify the impact of neuroimaging preprocessing strategies on subsequent statistical analyses

Even though novel imaging techniques have been successful in studying brain structure and function, the measured biological signals are often contaminated by multiple sources of noise, arising due to e.g. head movements of the individual…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Brice Ozenne , Martin Norgaard , Cyril Pernet , Melanie Ganz

BugDoc: Algorithms to Debug Computational Processes

Data analysis for scientific experiments and enterprises, large-scale simulations, and machine learning tasks all entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities…

Databases · Computer Science 2020-04-15 Raoni Lourenço , Juliana Freire , Dennis Shasha