Related papers: Sketch-Oriented Databases
Data constraints are fundamental for practical data modelling, and a verifiable conformance of a data instance to a safety-critical constraint (satisfaction relation) is a corner-stone of safety assurance. Diagrammatic constraints are…
The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of ground-truth labels, is an important tool for drawing inferences from data.…
Constrained sequential pattern mining aims at identifying frequent patterns on a sequential database of items while observing constraints defined over the item attributes. We introduce novel techniques for constraint-based sequential…
We consider distributed optimization methods for problems where forming the Hessian is computationally challenging and communication is a significant bottleneck. We leverage randomized sketches for reducing the problem dimensions as well as…
Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional…
Data sketches are approximate succinct summaries of long streams. They are widely used for processing massive amounts of data and answering statistical queries about it in real-time. Existing libraries producing sketches are very fast, but…
Existing data visualization formalisms are restricted to single-table inputs, which makes existing visualization grammars like Vega-lite or ggplot2 tedious to use, have overly complex APIs, and unsound when visualization multi-table data.…
Graph simulation (using graph schemata or data guides) has been successfully proposed as a technique for adding structure to semistructured data. Design patterns for description (such as meta-classes and homomorphisms between schema…
Given a database, computing the fraction of rows that contain a query itemset or determining whether this fraction is above some threshold are fundamental operations in data mining. A uniform sample of rows is a good sketch of the database…
In this work, we investigate the problem of sketch-based object localization on natural images, where given a crude hand-drawn sketch of an object, the goal is to localize all the instances of the same object on the target image. This…
This paper presents a formalism for defining properties of paths in graph databases, which can be used to restrict the number of solutions to navigational queries. In particular, our formalism allows us to define quantitative properties…
This document reports on the use of an algebraic, visual, formal approach to the specification of patterns for the formalization of the GoF design patterns. The approach is based on graphs, morphisms and operations from category theory and…
Querying graph databases has recently received much attention. We propose a new approach to this problem, which balances competing goals of expressive power, language clarity and computational complexity. A distinctive feature of our…
The integration of knowledge extracted from different models described by domain experts or from models generated by machine learning algorithms is strongly conditioned by the lack of an appropriated framework to specify and integrate…
Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer are proposed to reduce the quadratic…
We propose a novel approach to program synthesis, focusing on synthesizing database queries. At a high level, our proposed algorithm takes as input a sketch with soft constraints encoding user intent, and then iteratively interacts with the…
Coded computing is a distributed paradigm that uses coding theory to introduce \textit{redundancy} and overcome bottlenecks in large-scale systems. In the same vein, randomized numerical linear algebra employs probabilistic methods to…
In this paper, we define a category DB, called the category of simplicial databases, whose objects are databases and whose morphisms are data-preserving maps. Along the way we give a precise formulation of the category of relational…
Sketches have shown high accuracy in multi-way join cardinality estimation, a critical problem in cost-based query optimization. Accurately estimating the cardinality of a join operation -- analogous to its computational cost -- allows the…
Sketching is a probabilistic data compression technique that has been largely developed in the computer science community. Numerical operations on big datasets can be intolerably slow; sketching algorithms address this issue by generating a…