Related papers: FlexDM: Enabling robust and reliable parallel data…

MLDev: Data Science Experiment Automation and Reproducibility Software

In this paper we explore the challenges of automating experiments in data science. We propose an extensible experiment model as a foundation for integration of different open source tools for running research experiments. We implement our…

Machine Learning · Computer Science 2022-09-21 Anton Khritankov , Nikita Pershin , Nikita Ukhov , Artem Ukhov

A Virtual Laboratory for Managing Computational Experiments

Computational experiments have become essential for scientific discovery, allowing researchers to test hypotheses, analyze complex datasets, and validate findings. However, as computational experiments grow in scale and complexity, ensuring…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-03 Eleni Adamidi , Panayiotis Deligiannis , Nikos Foutris , Thanasis Vergoulis

FlexModel: A Framework for Interpretability of Distributed Large Language Models

With the growth of large language models, now incorporating billions of parameters, the hardware prerequisites for their training and deployment have seen a corresponding increase. Although existing tools facilitate model parallelization…

Machine Learning · Computer Science 2023-12-07 Matthew Choi , Muhammad Adil Asif , John Willes , David Emerson

SCHeMa: Scheduling Scientific Containers on a Cluster of Heterogeneous Machines

In the era of data-driven science, conducting computational experiments that involve analysing large datasets using heterogeneous computational clusters, is part of the everyday routine for many scientists. Moreover, to ensure the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-23 Thanasis Vergoulis , Konstantinos Zagganas , Loukas Kavouras , Martin Reczko , Stelios Sartzetakis , Theodore Dalamagas

A Metadata-Based Ecosystem to Improve the FAIRness of Research Software

The reuse of research software is central to research efficiency and academic exchange. The application of software enables researchers with varied backgrounds to reproduce, validate, and expand upon study findings. Furthermore, the…

Software Engineering · Computer Science 2023-06-21 Patrick Kuckertz , Jan Göpfert , Oliver Karras , David Neuroth , Julian Schönau , Rodrigo Pueblas , Stephan Ferenz , Felix Engel , Noah Pflugradt , Jann M. Weinand , Astrid Nieße , Sören Auer , Detlef Stolten

Ensemble Toolkit: Scalable and Flexible Execution of Ensembles of Tasks

There are many science applications that require scalable task-level parallelism and support for flexible execution and coupling of ensembles of simulations. Most high-performance system software and middleware, however, are designed to…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-29 Vivekanandan Balasubramanian , Antons Treikalis , Ole Weidner , Shantenu Jha

FlexEmu: Towards Flexible MCU Peripheral Emulation (Extended Version)

Microcontroller units (MCUs) are widely used in embedded devices due to their low power consumption and cost-effectiveness. MCU firmware controls these devices and is vital to the security of embedded systems. However, performing dynamic…

Cryptography and Security · Computer Science 2025-09-10 Chongqing Lei , Zhen Ling , Xiangyu Xu , Shaofeng Li , Guangchi Liu , Kai Dong , Junzhou Luo

FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics

The identification and property prediction of chemical molecules is of central importance in the advancement of drug discovery and material science, where the tandem mass spectrometry technology gives valuable fragmentation cues in the form…

Artificial Intelligence · Computer Science 2026-04-14 Yunhua Zhong , Yixuan Tang , Yifan Li , Jie Yang , Pan Liu , Jun Xia

Flexible metadata harvesting for ecology using large language models

Large, open datasets can accelerate ecological research, particularly by enabling researchers to develop new insights by reusing datasets from multiple sources. However, to find the most suitable datasets to combine and integrate,…

Digital Libraries · Computer Science 2025-10-07 Zehao Lu , Thijs L van der Plas , Parinaz Rashidi , W Daniel Kissling , Ioannis N Athanasiadis

kEDM: A Performance-portable Implementation of Empirical Dynamic Modeling using Kokkos

Empirical Dynamic Modeling (EDM) is a state-of-the-art non-linear time-series analysis framework. Despite its wide applicability, EDM was not scalable to large datasets due to its expensive computational cost. To overcome this obstacle,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-27 Keichi Takahashi , Wassapon Watanakeesuntorn , Kohei Ichikawa , Joseph Park , Ryousei Takano , Jason Haga , George Sugihara , Gerald M. Pao

WaDec: Decompiling WebAssembly Using Large Language Model

WebAssembly (abbreviated Wasm) has emerged as a cornerstone of web development, offering a compact binary format that allows high-performance applications to run at near-native speeds in web browsers. Despite its advantages, Wasm's binary…

Software Engineering · Computer Science 2024-09-12 Xinyu She , Yanjie Zhao , Haoyu Wang

FlexLMM: a Nextflow linear mixed model framework for GWAS

Summary: Linear mixed models are a commonly used statistical approach in genome-wide association studies when population structure is present. However, naive permutations to empirically estimate the null distribution of a statistic of…

Genomics · Quantitative Biology 2024-10-03 Saul Pierotti , Tomas Fitzgerald , Ewan Birney

Luxical: High-Speed Lexical-Dense Text Embeddings

Frontier language model quality increasingly hinges on our ability to organize web-scale text corpora for training. Today's dominant tools trade off speed and flexibility: lexical classifiers (e.g., FastText) are fast but limited to…

Computation and Language · Computer Science 2025-12-12 DatologyAI , : , Luke Merrick , Alex Fang , Aldo Carranza , Alvin Deng , Amro Abbas , Brett Larsen , Cody Blakeney , Darren Teh , David Schwab , Fan Pan , Haakon Mongstad , Haoli Yin , Jack Urbanek , Jason Lee , Jason Telanoff , Josh Wills , Kaleigh Mentzer , Paul Burstein , Parth Doshi , Paul Burnstein , Pratyush Maini , Ricardo Monti , Rishabh Adiga , Scott Loftin , Siddharth Joshi , Spandan Das , Tony Jiang , Vineeth Dorna , Zhengping Wang , Bogdan Gaza , Ari Morcos , Matthew Leavitt

FlexStep: Enabling Flexible Error Detection in Multi/Many-core Real-time Systems

Reliability and real-time responsiveness in safety-critical systems have traditionally been achieved using error detection mechanisms, such as LockStep, which require pre-configured checker cores,strict synchronisation between main and…

Hardware Architecture · Computer Science 2025-03-19 Tinglue Wang , Yiming Li , Wei Tang , Jiapeng Guan , Zhenghui Guo , Renshuang Jiang , Ran Wei , Jing Li , Zhe Jiang

Oops!... I did it again. Conclusion (In-)Stability in Quantitative Empirical Software Engineering: A Large-Scale Analysis

Context: Mining software repositories is a popular means to gain insights into a software project's evolution, monitor project health, support decisions and derive best practices. Tools supporting the mining process are commonly applied by…

Software Engineering · Computer Science 2025-11-13 Nicole Hoess , Carlos Paradis , Rick Kazman , Wolfgang Mauerer

A Standardized Benchmark for Machine-Learned Molecular Dynamics using Weighted Ensemble Sampling

The rapid evolution of molecular dynamics (MD) methods, including machine-learned dynamics, has outpaced the development of standardized tools for method validation. Objective comparison between simulation approaches is often hindered by…

Machine Learning · Computer Science 2025-10-21 Alexander Aghili , Andy Bruce , Daniel Sabo , Sanya Murdeshwar , Kevin Bachelor , Ionut Mistreanu , Ashwin Lokapally , Razvan Marinescu

Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments

Running complex sets of machine learning experiments is challenging and time-consuming due to the lack of a unified framework. This leaves researchers forced to spend time implementing necessary features such as parallelization, caching,…

Machine Learning · Computer Science 2023-11-22 Zac Pullar-Strecker , Xinglong Chang , Liam Brydon , Ioannis Ziogas , Katharina Dost , Jörg Wicker

Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms

Many different machine learning algorithms exist; taking into account each algorithm's hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning…

Machine Learning · Computer Science 2013-03-08 Chris Thornton , Frank Hutter , Holger H. Hoos , Kevin Leyton-Brown

Using Apriori with WEKA for Frequent Pattern Mining

Knowledge exploration from the large set of data,generated as a result of the various data processing activities due to data mining only. Frequent Pattern Mining is a very important undertaking in data mining. Apriori approach applied to…

Databases · Computer Science 2014-07-01 Paresh Tanna , Yogesh Ghodasara

Any-Order Flexible Length Masked Diffusion

Masked diffusion models (MDMs) have recently emerged as a promising alternative to autoregressive models over discrete domains. MDMs generate sequences in an any-order, parallel fashion, enabling fast inference and strong performance on…

Machine Learning · Computer Science 2025-09-09 Jaeyeon Kim , Lee Cheuk-Kit , Carles Domingo-Enrich , Yilun Du , Sham Kakade , Timothy Ngotiaoco , Sitan Chen , Michael Albergo