Related papers: Optimizing Provenance Computations
A well-established technique for capturing database provenance as annotations on data is to instrument queries to propagate such annotations. However, even sophisticated query optimizers often fail to produce efficient execution plans for…
We study in this paper provenance information for queries with aggregation. Provenance information was studied in the context of various query languages that do not allow for aggregation, and recent work has suggested to capture provenance…
Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Recent work has proposed to leverage ideas from data provenance…
Data provenance has numerous applications in the context of data preparation pipelines. It can be used for debugging faulty pipelines, interpreting results, verifying fairness, and identifying data quality issues, which may affect the…
Provenance encodes information that connects datasets, their generation workflows, and associated metadata (e.g., who or when executed a query). As such, it is instrumental for a wide range of critical governance applications (e.g.,…
Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Previous work has shown that fine-grained data provenance can help…
Profile-Guided Optimization (PGO) is an excellent means to improve the performance of a compiled program. Indeed, the execution path data it provides helps the compiler to generate better code and better cacheline packing. At the time of…
Provenance is an increasing concern due to the ongoing revolution in sharing and processing scientific data on the Web and in other computer systems. It is proposed that many computer systems will need to become provenance-aware in order to…
Demand is growing for more accountability regarding the technological systems that increasingly occupy our world. However, the complexity of many of these systems - often systems-of-systems - poses accountability challenges. A key reason…
Organizations of all kinds, whether public or private, profit-driven or non-profit, and across various industries and sectors, rely on dashboards for effective data visualization. However, the reliability and efficacy of these dashboards…
Effective provenance tracking enhances reproducibility, governance, and data quality in array workflows. However, significant challenges arise in capturing this provenance, including: (1) rapidly evolving APIs, (2) diverse operation types,…
Provenance in scientific workflows is essential for understand- ing and reproducing processes, while in business processes, it can ensure compliance and correctness and facilitates process mining. However, the provenance of process…
Provenance plays a crucial role in scientific workflow execution, for instance by providing data for failure analysis, real-time monitoring, or statistics on resource utilization for right-sizing allocations. The workflows themselves,…
Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins of data products, usage patterns of datasets). Unfortunately, existing…
Provenance is derivative journal information about the origin and activities of system data and processes. For a highly dynamic system like the cloud, provenance can be accurately detected and securely used in cloud digital forensic…
Data provenance collects comprehensive information about the events and operations in a computer system at both application and system levels. It provides a detailed and accurate history of transactions that help delineate the data flow…
In today's data-driven ecosystems, ensuring data integrity, traceability and accountability is important. Provenance polynomials constitute a powerful formalism for tracing the origin and the derivations made to produce database query…
As the demand for large scale AI models continues to grow, the optimization of their training to balance computational efficiency, execution time, accuracy and energy consumption represents a critical multidimensional challenge. Achieving…
The algebraic approach for provenance tracking, originating in the semiring model of Green et. al, has proven useful as an abstract way of handling metadata. Commutative Semirings were shown to be the "correct" algebraic structure for Union…
Users typically interact with a database by asking queries and examining the results. We refer to the user examining the query results and asking follow-up questions as query result exploration. Our work builds on two decades of provenance…