Related papers: Validity Constraints for Data Analysis Workflows
In data-intensive real-time applications, such as smart transportation and manufacturing, ensuring data freshness is essential, as using obsolete data can lead to negative outcomes. Validity intervals serve as the standard means to specify…
The Collaborative Analysis Versioning Environment System (CAVES) project concentrates on the interactions between users performing data and/or computing intensive analyses on large data sets, as encountered in many contemporary scientific…
The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that…
The proliferation of SQL for data processing has often occurred without the rigor of traditional software development, leading to siloed efforts, logic replication, and increased risk. This ad-hoc approach hampers data governance and makes…
Visualization, from simple line plots to complex high-dimensional visual analysis systems, has established itself throughout numerous domains to explore, analyze, and evaluate data. Applying such visualizations in the context of simulation…
Causality has been recently introduced in databases, to model, characterize, and possibly compute causes for query answers. Connections between QA-causality and consistency-based diagnosis and database repairs (wrt. integrity constraint…
Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic…
We present DataFlow, a computational framework for building, testing, and deploying high-performance machine learning systems on unbounded time-series data. Traditional data science workflows assume finite datasets and require substantial…
Modern CI/CD pipelines integrating agent-generated code exhibit a structural failure in responsibility attribution. Decisions are executed through formally correct approval processes, yet no entity possesses both the authority to approve…
This report presents a taxonomy of vulnerabilities created as a part of an effort to develop a framework for deriving verification and validation strategies to assess software security. This taxonomy is grounded in a theoretical model of…
Implementing correct distributed systems is an error-prone task. Runtime Verification (RV) offers a lightweight formal method to improve reliability by monitoring system executions against correctness properties. However, applying RV in…
Approximate computing (AC) is an emerging paradigm for energy-efficient computation. The basic idea of AC is to sacrifice high precision for low energy by allowing for hardware which only carries out "approximately correct" calculations.…
The development of complex software requires tools promoting fail-fast approaches, so that bugs and unexpected behavior can be quickly identified and fixed. Tools for data validation may save the day of computer programmers. In fact,…
Recent research increasingly brings to question the appropriateness of using predictive tools in complex, real-world tasks. While a growing body of work has explored ways to improve value alignment in these tools, comparatively less work…
Programming with logic for sophisticated applications must deal with recursion and negation, which together have created significant challenges in logic, leading to many different, conflicting semantics of rules. This paper describes a…
Assessing and improving the quality of data are fundamental challenges for data-intensive systems that have given rise to applications targeting transformation and cleaning of data. However, while schema design, data cleaning, and data…
A workflow specification defines a set of steps and the order in which those steps must be executed. Security requirements and business rules may impose constraints on which users are permitted to perform those steps. A workflow…
Artificial Intelligence Virtual Cells (AIVCs) aim to learn executable, decision-relevant models of cell state from multimodal, multiscale measurements. Recent studies have introduced single-cell and spatial foundation models, improved…
Ensuring data correctness over partitioned distributed database systems is a classical problem. Classical solutions proposed to solve this problem are mainly adopting locking or blocking techniques. These techniques are not suitable for…
Big Data can mean different things to different people. The scale and challenges of Big Data are often described using three attributes, namely Volume, Velocity and Variety (3Vs), which only reflect some of the aspects of data. In this…