Related papers: First Study on Data Readiness Level

Level of Scientific Readiness with Ternary Data Types

In addition to the technology readiness level (TRL the scientific readiness level (SRL) has been introduced as a more authentic and adequate tool for determining the status quo of scientific and scientific-technical projects of fundamental…

Digital Libraries · Computer Science 2024-10-15 Eldar Knar

SMART: a Technology Readiness Methodology in the Frame of the NIS Directive

An ever shorter technology lifecycle engendered the need for assessing new technologies w.r.t. their market readiness. Knowing the Technology readiness level (TRL) of a given target technology proved to be useful to mitigate risks such as…

Computers and Society · Computer Science 2022-01-04 Archana Kumari , Stefan Schiffner , Sandra Schmitz

Data Readiness Levels

Application of models to data is fraught. Data-generating collaborators often only have a very basic understanding of the complications of collating, processing and curating data. Challenges include: poor data collection practices, missing…

Databases · Computer Science 2017-05-08 Neil D. Lawrence

Data-Driven Relevance Judgments for Ranking Evaluation

Ranking evaluation metrics are a fundamental element of design and improvement efforts in information retrieval. We observe that most popular metrics disregard information portrayed in the scores used to derive rankings, when available.…

Information Retrieval · Computer Science 2016-12-20 Nuno Moniz , Luís Torgo , João Vinagre

Measuring the Reliability of Reinforcement Learning Algorithms

Lack of reliability is a well-known issue for reinforcement learning (RL) algorithms. This problem has gained increasing attention in recent years, and efforts to improve it have grown substantially. To aid RL researchers and production…

Machine Learning · Statistics 2020-02-14 Stephanie C. Y. Chan , Samuel Fishman , John Canny , Anoop Korattikara , Sergio Guadarrama

Beyond Scale: The Diversity Coefficient as a Data Quality Metric for Variability in Natural Language Data

Current trends in pre-training Large Language Models (LLMs) primarily focus on the scaling of model and dataset size. While the quality of pre-training data is considered an important factor for training powerful LLMs, it remains a nebulous…

Computation and Language · Computer Science 2025-07-04 Brando Miranda , Alycia Lee , Sudharsan Sundar , Allison Casasola , Rylan Schaeffer , Elyas Obbad , Sanmi Koyejo

Reliability Quantification of Deep Reinforcement Learning-based Control

Reliability quantification of deep reinforcement learning (DRL)-based control is a significant challenge for the practical application of artificial intelligence (AI) in safety-critical systems. This study proposes a method for quantifying…

Systems and Control · Electrical Eng. & Systems 2024-07-22 Hitoshi Yoshioka , Hirotada Hashimoto

REDELEX: A Framework for Relational Deep Learning Exploration

Relational databases (RDBs) are widely regarded as the gold standard for storing structured information. Consequently, predictive tasks leveraging this data format hold significant application promise. Recently, Relational Deep Learning…

Machine Learning · Computer Science 2025-12-15 Jakub Peleška , Gustav Šír

Unfolding Data Quality Dimensions in Practice: A Survey

Data quality describes the degree to which data meet specific requirements and are fit for use by humans and/or downstream tasks (e.g., artificial intelligence). Data quality can be assessed across multiple high-level concepts called…

Databases · Computer Science 2025-07-24 Vasileios Papastergios , Lisa Ehrlinger , Anastasios Gounaris

Technology Readiness Levels for AI & ML

The development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. The lack of diligence can lead to technical debt, scope creep and misaligned…

Software Engineering · Computer Science 2020-12-17 Alexander Lavin , Gregory Renard

Studying Retrievability of Publications and Datasets in an Integrated Retrieval System

In this paper, we investigate the retrievability of datasets and publications in a real-life Digital Library (DL). The measure of retrievability was originally developed to quantify the influence that a retrieval system has on the access to…

Information Retrieval · Computer Science 2022-07-22 Dwaipayan Roy , Zeljko Carevic , Philipp Mayr

Technology Readiness Levels for Machine Learning Systems

The development and deployment of machine learning (ML) systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. The lack of diligence can lead to technical debt, scope creep and misaligned…

Machine Learning · Computer Science 2023-01-11 Alexander Lavin , Ciarán M. Gilligan-Lee , Alessya Visnjic , Siddha Ganju , Dava Newman , Atılım Güneş Baydin , Sujoy Ganguly , Danny Lange , Amit Sharma , Stephan Zheng , Eric P. Xing , Adam Gibson , James Parr , Chris Mattmann , Yarin Gal

Causal Representation Learning on High-Dimensional Data: Benchmarks, Reproducibility, and Evaluation Metrics

Causal representation learning (CRL) models aim to transform high-dimensional data into a latent space, enabling interventions to generate counterfactual samples or modify existing data based on the causal relationships among latent…

Machine Learning · Computer Science 2026-03-19 Alireza Sadeghi , Wael AbdAlmageed

Towards Modeling Data Quality and Machine Learning Model Performance

Understanding the effect of uncertainty and noise in data on machine learning models (MLM) is crucial in developing trust and measuring performance. In this paper, a new model is proposed to quantify uncertainties and noise in data on MLMs.…

Machine Learning · Computer Science 2024-12-10 Usman Anjum , Chris Trentman , Elrod Caden , Justin Zhan

The bliss of dimensionality: how an unsupervised criterion identifies optimal low-resolution representations of high-dimensional datasets

Selecting the optimal resolution for discretizing high-dimensional data is a central problem in physics and data analysis, particularly in unsupervised settings where the underlying distribution is unknown. The Relevance-Resolution…

Statistical Mechanics · Physics 2026-03-06 Margherita Mele , Daniel Campos Moreno , Raffaello Potestio

Energy-Efficient and High-Performance Data Transfers with DRL Agents

The rapid growth of data across fields of science and industry has increased the need to improve the performance of end-to-end data transfers while using the resources more efficiently. In this paper, we present a dynamic, multiparameter…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-27 Hasibul Jamil , Jacob Goldverg , Elvis Rodrigues , MD S Q Zulkar Nine , Tevfik Kosar

Literature Review Of Attribute Level And Structure Level Data Linkage Techniques

Data Linkage is an important step that can provide valuable insights for evidence-based decision making, especially for crucial events. Performing sensible queries across heterogeneous databases containing millions of records is a complex…

Databases · Computer Science 2015-10-09 Mohammed Gollapalli

Class Density and Dataset Quality in High-Dimensional, Unstructured Data

We provide a definition for class density that can be used to measure the aggregate similarity of the samples within each of the classes in a high-dimensional, unstructured dataset. We then put forth several candidate methods for…

Machine Learning · Computer Science 2022-02-09 Adam Byerly , Tatiana Kalganova

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data. Resilience is a…

Machine Learning · Computer Science 2017-11-28 Jacob Steinhardt , Moses Charikar , Gregory Valiant

Reliable Measures of Spread in High Dimensional Latent Spaces

Understanding geometric properties of natural language processing models' latent spaces allows the manipulation of these properties for improved performance on downstream tasks. One such property is the amount of data spread in a model's…

Machine Learning · Computer Science 2023-08-02 Anna C. Marbut , Katy McKinney-Bock , Travis J. Wheeler