数据库 — Scifaro

Data Warehouse and Decision Support on Integrated Crop Big Data

In recent years, precision agriculture is becoming very popular. The introduction of modern information and communication technologies for collecting and processing Agricultural data revolutionise the agriculture practises. This has started…

数据库 · 计算机科学 2021-04-13 V. M. Ngo , N. A. Le-Khac , M. T. Kechadi

Enhancing Virtual Ontology Based Access over Tabular Data with Morph-CSV

Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets, either by materializing integrated data into RDF or by performing on-the fly querying via SPARQL query translation. In the…

数据库 · 计算机科学 2021-04-13 David Chaves-Fraga , Edna Ruckhaus , Freddy Priyatna , Maria-Esther Vidal , Oscar Corcho

ProMIPS: Efficient High-Dimensional c-Approximate Maximum Inner Product Search with a Lightweight Index

Due to the wide applications in recommendation systems, multi-class label prediction and deep learning, the Maximum Inner Product (MIP) search problem has received extensive attention in recent years. Faced with large-scale datasets…

数据库 · 计算机科学 2021-04-12 Yang Song , Yu Gu , Rui Zhang , Ge Yu

Prism: Private Verifiable Set Computation over Multi-Owner Outsourced Databases

This paper proposes Prism, a secret sharing based approach to compute private set operations (i.e., intersection and union), as well as aggregates over outsourced databases belonging to multiple owners. Prism enables data owners to pre-load…

数据库 · 计算机科学 2021-04-09 Yin Li , Dhrubajyoti Ghosh , Peeyush Gupta , Sharad Mehrotra , Nisha Panwar , Shantanu Sharma

Accurate and Efficient Suffix Tree Based Privacy-Preserving String Matching

The task of calculating similarities between strings held by different organizations without revealing these strings is an increasingly important problem in areas such as health informatics, national censuses, genomics, and fraud detection.…

数据库 · 计算机科学 2021-04-08 Sirintra Vaiwsri , Thilina Ranbaduge , Peter Christen , Kee Siong Ng

A Unified System for Data Analytics and In Situ Query Processing

In today's world data is being generated at a high rate due to which it has become inevitable to analyze and quickly get results from this data. Most of the relational databases primarily support SQL querying with a limited support for…

数据库 · 计算机科学 2021-04-08 Alex Watson , Suvam Kumar Das , Suprio Ray

Content-defined Merkle Trees for Efficient Container Delivery

Containerization simplifies the sharing and deployment of applications when environments change in the software delivery chain. To deploy an application, container delivery methods push and pull container images. These methods operate on…

数据库 · 计算机科学 2021-04-07 Yuta Nakamura , Raza Ahmad , Tanu Malik

CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks

Data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training. However, to date, there does not exist a rigorous study on how exactly cleaning…

数据库 · 计算机科学 2021-04-07 Peng Li , Xi Rao , Jennifer Blase , Yue Zhang , Xu Chu , Ce Zhang

EKO: Adaptive Sampling of Compressed Video Data

Researchers have presented systems for efficiently analysing video data at scale using sampling algorithms. While these systems effectively leverage the temporal redundancy present in videos, they suffer from three limitations. First, they…

数据库 · 计算机科学 2021-04-06 Jaeho Bang , Pramod Chunduri , Joy Arulraj

Public Transport Planning: When Transit Network Connectivity Meets Commuting Demand

In this paper, we make a first attempt to incorporate both commuting demand and transit network connectivity in bus route planning (CT-Bus), and formulate it as a constrained optimization problem: planning a new bus route with k edges over…

数据库 · 计算机科学 2021-04-06 Sheng Wang , Yuan Sun , Christopher Musco , Zhifeng Bao

Multi-Dimensional Event Data in Graph Databases

Process event data is usually stored either in a sequential process event log or in a relational database. While the sequential, single-dimensional nature of event logs aids querying for (sub)sequences of events based on temporal relations…

数据库 · 计算机科学 2021-04-06 Stefan Esser , Dirk Fahland

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation…

数据库 · 计算机科学 2021-04-05 Umair Qudus , Muhammad Saleem , Axel-Cyrille Ngonga Ngomo , Young-koo Lee

Symmetric Continuous Subgraph Matching with Bidirectional Dynamic Programming

In many real datasets such as social media streams and cyber data sources, graphs change over time through a graph update stream of edge insertions and deletions. Detecting critical patterns in such dynamic graphs plays an important role in…

数据库 · 计算机科学 2021-04-05 Seunghwan Min , Sung Gwan Park , Kunsoo Park , Dora Giammarresi , Giuseppe F. Italiano , Wook-Shin Han

Efficiently Answering Durability Prediction Queries

We consider a class of queries called durability prediction queries that arise commonly in predictive analytics, where we use a given predictive model to answer questions about possible futures to inform our decisions. Examples of…

数据库 · 计算机科学 2021-04-02 Junyang Gao , Yifan Xu , Pankaj K. Agarwal , Jun Yang

Properties of Inconsistency Measures for Databases

How should we quantify the inconsistency of a database that violates integrity constraints? Proper measures are important for various tasks, such as progress indication and action prioritization in cleaning systems, and reliability…

数据库 · 计算机科学 2021-04-02 Ester Livshits , Rina Kochirgan , Segev Tsur , Ihab F. Ilyas , Benny Kimelfeld , Sudeepa Roy

Efficient Exploration of Interesting Aggregates in RDF Graphs

As large Open Data are increasingly shared as RDF graphs today, there is a growing demand to help users discover the most interesting facets of a graph, which are often hard to grasp without automatic tools. We consider the problem of…

数据库 · 计算机科学 2021-04-01 Yanlei Diao , Paweł Guzewicz , Ioana Manolescu , Mirjana Mazuran

TUSQ: Targeted High-Utility Sequence Querying

Significant efforts have been expended in the research and development of a database management system (DBMS) that has a wide range of applications for managing an enormous collection of multisource, heterogeneous, complex, or growing data.…

数据库 · 计算机科学 2021-04-01 Chunkai Zhang , Zilin Du , Quanjian Dai , Wensheng Gan , Jian Weng , Philip S. Yu

VSS: A Storage System for Video Analytics [Technical Report]

We present a new video storage system (VSS) designed to decouple high-level video operations from the low-level details required to store and efficiently retrieve video data. VSS is designed to be the storage subsystem of a video data…

数据库 · 计算机科学 2021-04-01 Brandon Haynes , Maureen Daum , Dong He , Amrita Mazumdar , Magdalena Balazinska , Alvin Cheung , Luis Ceze

Parallel Index-Based Structural Graph Clustering and Its Approximation

SCAN (Structural Clustering Algorithm for Networks) is a well-studied, widely used graph clustering algorithm. For large graphs, however, sequential SCAN variants are prohibitively slow, and parallel SCAN variants do not effectively share…

数据库 · 计算机科学 2021-04-01 Tom Tseng , Laxman Dhulipala , Julian Shun

Discovering High Utility-Occupancy Patterns from Uncertain Data

It is widely known that there is a lot of useful information hidden in big data, leading to a new saying that "data is money." Thus, it is prevalent for individuals to mine crucial information for utilization in many real-world…

数据库 · 计算机科学 2021-04-01 Chien-Ming Chen , Lili Chen , Wensheng Gan , Lina Qiu , Weiping Ding