Related papers: Toward Temporal Attribution Analytics in Dataflows
In temporal interaction networks, vertices correspond to entities, which exchange data quantities (e.g., money, bytes, messages) over time. Tracking the origin of data that have reached a given vertex at any time can help data analysts to…
Provenance refers to the documentation of an object's lifecycle. This documentation (often represented as a graph) should include all the information necessary to reproduce a certain piece of data or the process that led to it. In a dynamic…
In the world of science new technology have opened up the possibility to rely on advanced computational methods and models to conduct and produce scientific research. An important aspect of scientific and business workflows is provenance -…
Demand is growing for more accountability regarding the technological systems that increasingly occupy our world. However, the complexity of many of these systems - often systems-of-systems - poses accountability challenges. A key reason…
Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic…
Data provenance is a valuable tool for detecting and preventing cyber attack, providing insight into the nature of suspicious events. For example, an administrator can use provenance to identify the perpetrator of a data leak, track an…
Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all inputs (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between…
In this paper, we investigate how we can leverage Spark platform for efficiently processing provenance queries on large volumes of workflow provenance data. We focus on processing provenance queries at attribute-value level which is the…
Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging.…
The complexity of exploratory data analysis poses significant challenges for collaboration and effective communication of analytic workflows. Automated methods can alleviate these challenges by summarizing workflows into more interpretable…
For data-centric systems, provenance tracking is particularly important when the system is open and decentralised, such as the Web of Linked Data. In this paper, a concise but expressive calculus which models data updates is presented. The…
To benefit from the abundance of data and the insights it brings data processing pipelines are being used in many areas of research and development in both industry and academia. One approach to automating data processing pipelines is the…
Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of…
Analytic provenance can be visually encoded to help users track their ongoing analysis trajectories, recall past interactions, and inform new analytic directions. Despite its significance, provenance is often hardwired into analytics…
The transformations, analyses and interpretations of data in scientific workflows are vital for the repeatability and reliability of scientific workflows. This provenance of scientific workflows has been effectively carried out in Grid…
One of the foundations of science is that researchers must publish the methodology used to achieve their results so that others can attempt to reproduce them. This has the added benefit of allowing methods to be adopted and adapted for…
We establish a translation between a formalism for dynamic programming over hypergraphs and the computation of semiring-based provenance for Datalog programs. The benefit of this translation is a new method for computing provenance for a…
Data provenance consists in bookkeeping meta information during query evaluation, in order to enrich query results with their trust level, likelihood, evaluation cost, and more. The framework of semiring provenance abstracts from the…
Provenance in scientific workflows is essential for understand- ing and reproducing processes, while in business processes, it can ensure compliance and correctness and facilitates process mining. However, the provenance of process…
Successful data-driven science requires complex data engineering pipelines to clean, transform, and alter data in preparation for machine learning, and robust results can only be achieved when each step in the pipeline can be justified, and…