Related papers: Metadata practices for simulation workflows

Towards Exascale Scientific Metadata Management

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination…

Databases · Computer Science 2015-03-31 Spyros Blanas , Surendra Byna

EngMeta -- Metadata for Computational Engineering

Computational engineering generates knowledge through the analysis and interpretation of research data, which is produced by computer simulation. Supercomputers produce huge amounts of research data. To address a research question, a lot of…

Information Retrieval · Computer Science 2020-07-15 Björn Schembera , Dorothea Iglezakis

Building Containerized Environments for Reproducibility and Traceability of Scientific Workflows

Scientists rely on simulations to study natural phenomena. Trusting the simulation results is vital to develop sciences in any field. One approach to build trust is to ensure the reproducibility and traceability of the simulations through…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-21 Paula Olaya , Jay Lofstead , Michela Taufer

Automatic Metadata Capture and Processing for High-Performance Workflows

Modern workflows run on increasingly heterogeneous computing architectures and with this heterogeneity comes additional complexity. We aim to apply the FAIR principles for research reproducibility by developing software to collect metadata…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-19 Polina Shpilker , Line Pouchard

A Virtual Laboratory for Managing Computational Experiments

Computational experiments have become essential for scientific discovery, allowing researchers to test hypotheses, analyze complex datasets, and validate findings. However, as computational experiments grow in scale and complexity, ensuring…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-03 Eleni Adamidi , Panayiotis Deligiannis , Nikos Foutris , Thanasis Vergoulis

Bridging the Gap Between Methodological Research and Statistical Practice: Toward "Translational Simulation Research

Simulations are valuable tools for empirically evaluating the properties of statistical methods and are primarily employed in methodological research to draw general conclusions about methods. In addition, they can often be useful to…

Other Statistics · Statistics 2025-10-08 Anne-Laure Boulesteix , Patrick Callahan , Luzia Hanssum , Vincent Gaertner , Eva Hoster

Evaluation of tools for describing, reproducing and reusing scientific workflows

In the field of computational science and engineering, workflows often entail the application of various software, for instance, for simulation or pre- and postprocessing. Typically, these components have to be combined in arbitrarily…

Software Engineering · Computer Science 2022-11-15 Philipp Diercks , Dennis Gläser , Ontje Lünsdorf , Michael Selzer , Bernd Flemisch , Jörg F. Unger

Improving Radiography Machine Learning Workflows via Metadata Management for Training Data Selection

Most machine learning models require many iterations of hyper-parameter tuning, feature engineering, and debugging to produce effective results. As machine learning models become more complicated, this pipeline becomes more difficult to…

Machine Learning · Computer Science 2024-08-26 Mirabel Reid , Christine Sweeney , Oleg Korobkin

An Alternative Approach to Data Acquisition Using Keyboard Emulation Technique

A number of data acquisition systems depend on human interface to access computer for measuring, processing and analyzing data and to prepare it for presentation and storage. Data acquisition software is installed on the computer and all…

Human-Computer Interaction · Computer Science 2010-11-01 Shahrukh Khalid , Adnan Ali Khan

SimFS: A Simulation Data Virtualizing File System Interface

Nowadays simulations can produce petabytes of data to be stored in parallel filesystems or large-scale databases. This data is accessed over the course of decades often by thousands of analysts and scientists. However, storing these volumes…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-11 Salvatore Di Girolamo , Pirmin Schmid , Thomas Schulthess , Torsten Hoefler

On the need for synthetic data and robust data simulators in the 2020s

As observational datasets become larger and more complex, so too are the questions being asked of these data. Data simulations, i.e., synthetic data with properties (pixelization, noise, PSF, artifacts, etc.) akin to real data, are…

Instrumentation and Methods for Astrophysics · Physics 2019-10-25 Molly S. Peeples , Bjorn Emonts , Mark Kyprianou , Matthew T. Penny , Gregory F. Snyder , Christopher C. Stark , Michael Troxel , Neil T. Zimmerman , John ZuHone

Learning To Simulate

Simulation is a useful tool in situations where training data for machine learning models is costly to annotate or even hard to acquire. In this work, we propose a reinforcement learning-based method for automatically adjusting the…

Machine Learning · Computer Science 2019-05-15 Nataniel Ruiz , Samuel Schulter , Manmohan Chandraker

Synthetic Data in Healthcare

Synthetic data are becoming a critical tool for building artificially intelligent systems. Simulators provide a way of generating data systematically and at scale. These data can then be used either exclusively, or in conjunction with real…

Artificial Intelligence · Computer Science 2023-04-07 Daniel McDuff , Theodore Curran , Achuta Kadambi

Managing large-scale scientific hypotheses as uncertain and probabilistic data with support for predictive analytics

The sheer scale of high-resolution raw data generated by simulation has motivated non-conventional approaches for data exploration referred as `immersive' and `in situ' query processing of the raw simulation data. Another step towards…

Databases · Computer Science 2015-08-25 Bernardo Gonçalves , Fabio Porto

Introduction to Multi-Agent Simulation

When designing systems that are complex, dynamic and stochastic in nature, simulation is generally recognised as one of the best design support technologies, and a valuable aid in the strategic and tactical decision making process. A…

Neural and Evolutionary Computing · Computer Science 2013-05-30 Peer-Olaf Siebers , Uwe Aickelin

How are Software Repositories Mined? A Systematic Literature Review of Workflows, Methodologies, Reproducibility, and Tools

With the advent of open source software, a veritable treasure trove of previously proprietary software development data was made available. This opened the field of empirical software engineering research to anyone in academia. Data that is…

Software Engineering · Computer Science 2022-04-19 Adam Tutko , Austin Z. Henley , Audris Mockus

Best Practices and Lessons Learned on Synthetic Data

The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution…

Computation and Language · Computer Science 2024-08-13 Ruibo Liu , Jerry Wei , Fangyu Liu , Chenglei Si , Yanzhe Zhang , Jinmeng Rao , Steven Zheng , Daiyi Peng , Diyi Yang , Denny Zhou , Andrew M. Dai

Ensuring Adherence to Standards in Experiment-Related Metadata Entered Via Spreadsheets

Scientists increasingly recognize the importance of providing rich, standards-adherent metadata to describe their experimental results. Despite the availability of sophisticated tools to assist in the process of data annotation,…

Digital Libraries · Computer Science 2025-11-25 Martin J. O'Connor , Josef Hardi , Marcos Martínez-Romero , Sowmya Somasundaram , Brendan Honick , Stephen A. Fisher , Ajay Pillai , Mark A. Musen

Meta-analysis parameters computation: a Python approach to facilitate the crossing of experimental conditions

Meta-analysis is a data aggregation method that establishes an overall and objective level of evidence based on the results of several studies. It is necessary to maintain a high level of homogeneity in the aggregation of data collected…

Methodology · Statistics 2020-07-16 Flavien Quijoux , Charles Truong , Aliénor Vienne-Jumeau , Laurent Oudre , François BERTIN-HUGAULT , Philippe ZAWIEJA , Marie LEFEVRE , Pierre-Paul VIDAL , Damien RICARD

Developing a Robust Migration Workflow for Preserving and Curating Hand-held Media

Many memory institutions hold large collections of hand-held media, which can comprise hundreds of terabytes of data spread over many thousands of data-carriers. Many of these carriers are at risk of significant physical degradation over…

Digital Libraries · Computer Science 2013-09-20 Angela Dappert , Andrew N. Jackson , Akiko Kimura