Related papers: Data Validation for Big Live Data
This paper presents incremental verification-validation, a novel approach for checking rich data structure invariants expressed as separation logic assertions. Incremental verification-validation combines static verification of separation…
One of the purposes of Big Data systems is to support analysis of data gathered from heterogeneous data sources. Since data warehouses have been used for several decades to achieve the same goal, they could be leveraged also to provide…
Data Warehouse provides storage for huge amounts of historical data from heterogeneous operational sources in the form of multidimensional views, thus supplying sensitive and useful information which help decision-makers to improve the…
The data warehouse (DW) technology was developed to integrate heterogeneous information sources for analysis purposes. Information sources are more and more autonomous and they often change their content due to perpetual transactions (data…
In Big data era, information integration often requires abundant data extracted from massive data sources. Due to a large number of data sources, data source selection plays a crucial role in information integration, since it is costly and…
Data comes in many forms. From a shallow perspective, they can be viewed as being either in structured (e.g., as a relation, as key-value pairs) or unstructured (e.g., text, image) formats. So far, machines have been fairly good at…
The data warehousing is becoming increasingly important in terms of strategic decision making through their capacity to integrate heterogeneous data from multiple information sources in a common storage space, for querying and analysis. So…
Checking data quality against domain knowledge is a common activity that pervades statistical analysis from raw data to output. The R package 'validate' facilitates this task by capturing and applying expert knowledge in the form of…
A data warehouse is a large data repository for the purpose of analysis and decision making in organizations. To improve the query performance and to get fast access to the data, data is stored as materialized views (MV) in the data…
Data reuse is fundamental for reducing the data integration effort required to build data supporting new applications, especially in data scarcity contexts. However, data reuse requires to deal with data heterogeneity, which is always…
In tracing the (robotically automated) logistics of large quantities of goods, inexpensive passive RFID tags are preferred for cost reasons. Accordingly, security between such tags and readers have primarily been studied among many issues…
In order to fully unlock the transformative power of distributed ledgers and blockchains, it is crucial to develop innovative consensus algorithms that can overcome the obstacles of security, scalability, and interoperability, which…
Along with the miniaturization of various types of sensors, a mass of intelligent terminals are gaining stronger sensing capability, which raises a deeper perception and better prospect of Internet of Things (IoT). With big sensing data,…
A major challenge in nuclear fusion research is the coherent combination of data from heterogeneous diagnostics and modelling codes for machine control and safety as well as physics studies. Measured data from different diagnostics often…
Stochastic models are widely used to verify whether systems satisfy their reliability, performance and other nonfunctional requirements. However, the validity of the verification depends on how accurately the parameters of these models can…
In data-intensive real-time applications, such as smart transportation and manufacturing, ensuring data freshness is essential, as using obsolete data can lead to negative outcomes. Validity intervals serve as the standard means to specify…
Data spaces represent an emerging paradigm that facilitates secure and trusted data exchange through foundational elements of data interoperability, sovereignty, and trust. Within a data space, data items, potentially owned by different…
Verifying identity documents from a large Central Identity Database (CIDB) is always challenging and it get more challenging when we need to verify a large number of documents at the same time. Usually most of the time we setup a gateway…
Web sites routinely incorporate JavaScript programs from several sources into a single page. These sources must be protected from one another, which requires robust sandboxing. The many entry-points of sandboxes and the subtleties of…
In the distributed and dynamic framework of the Web, data quality is a big challenge. The Linked Open Data (LOD) provides an enormous amount of data, the quality of which is difficult to control. Quality is intrinsically a matter of usage,…