Related papers: Faster Relational Algorithms Using Geometric Data …
We present a simple geometric framework for the relational join. Using this framework, we design an algorithm that achieves the fractional hypertree-width bound, which generalizes classical and recent worst-case algorithmic results on…
Conventional machine learning algorithms cannot be applied until a data matrix is available to process. When the data matrix needs to be obtained from a relational database via a feature extraction query, the computation cost can be…
Relational databases, organized into tables connected by primary-foreign key relationships, are a common format for organizing data. Making predictions on relational data often involves transforming them into a flat tabular format through…
Clustering plays a crucial role in computer science, facilitating data analysis and problem-solving across numerous fields. By partitioning large datasets into meaningful groups, clustering reveals hidden structures and relationships within…
Property graphs often contain tree-shaped substructures, yet they are not captured by existing proposals for graph schemas; likewise, query languages and query engines offer little-to-no native support for managing them systematically. As a…
Geometric relational embeddings map relational data as geometric objects that combine vector information suitable for machine learning and structured/relational information for structured/relational reasoning, typically in low dimensions.…
Joins in native graph database management systems (GDBMSs) are predefined to the system as edges, which are indexed in adjacency list indices and serve as pointers. This contrasts with and can be more performant than value-based joins in…
Relational representation learning transforms relational data into continuous and low-dimensional vector representations. However, vector-based representations fall short in capturing crucial properties of relational data that are complex…
Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates. Traditional approaches, rooted in physicochemical modeling and domain expertise, are often resource-intensive.…
Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count with several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice.…
Ordinal Embedding places n objects into R^d based on comparisons such as "a is closer to b than c." Current optimization-based approaches suffer from scalability problems and an abundance of low quality local optima. We instead consider a…
DBSCAN is a well-known density-based clustering algorithm to discover arbitrary shape clusters. While conceptually simple in serial, the algorithm is challenging to efficiently parallelize on manycore GPU architectures. Common pitfalls,…
We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean…
This paper gives a k-means approximation algorithm that is efficient in the relational algorithms model. This is an algorithm that operates directly on a relational database without performing a join to convert it to a matrix whose rows…
Spatial approximations have been traditionally used in spatial databases to accelerate the processing of complex geometric operations. However, approximations are typically only used in a first filtering step to determine a set of candidate…
Many tasks use data housed in relational databases to train boosted regression tree models. In this paper, we give a relational adaptation of the greedy algorithm for training boosted regression trees. For the subproblem of calculating the…
As the structural databases continue to expand, efficient methods are required to search similar structures of the query structure from the database. There are many previous works about comparing protein 3D structures and scanning the…
A method to search for local structural similarities in proteins at atomic resolution is presented. It is demonstrated that a huge amount of structural data can be handled within a reasonable CPU time by using a conventional relational…
Nearest neighbor searching of large databases in high-dimensional spaces is inherently difficult due to the curse of dimensionality. A flavor of approximation is, therefore, necessary to practically solve the problem of nearest neighbor…
The problem of {\em efficiently} finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied in literature. However, a closely related problem of efficiently…