Related papers: Cheetah: Accelerating Database Queries with Switch…
Sorting is a fundamental and well studied problem that has been studied extensively. Sorting plays an important role in the area of databases, as many queries can be served much faster if the relations are first sorted. One of the most…
Modern cloud-based data analytics systems must efficiently process petabytes of data residing on cloud storage. A key optimization technique in state-of-the-art systems like Snowflake is partition pruning - skipping chunks of data that do…
Network pruning techniques, including weight pruning and filter pruning, reveal that most state-of-the-art neural networks can be accelerated without a significant performance drop. This work focuses on filter pruning which enables…
Fast and scalable metadata management across multiple metadata servers is crucial for distributed file systems to handle numerous files and directories. Client-side caching of frequently accessed metadata can mitigate server loads, but…
The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful,…
The advent of switches with programmable dataplanes has enabled the rapid development of new network functionality, as well as providing a platform for acceleration of a broad range of application-level functionality. However, existing…
Machine learning has emerged as a powerful solution to the modern challenges in accelerator physics. However, the limited availability of beam time, the computational cost of simulations, and the high-dimensionality of optimisation problems…
Efficient data selection is essential for improving the training efficiency of deep neural networks and reducing the associated annotation costs. However, traditional methods tend to be computationally expensive, limiting their scalability…
Efficient data selection is crucial for enhancing the training efficiency of deep neural networks and minimizing annotation requirements. Traditional methods often face high computational costs, limiting their scalability and practical use.…
We develop the data structure PReaCH (for Pruned Reachability Contraction Hierarchies) which supports reachability queries in a directed graph, i.e., it supports queries that ask whether two nodes in the graph are connected by a directed…
Priority queues are fundamental abstract data structures, often used to manage limited resources in parallel programming. Several proposed parallel priority queue implementations are based on skiplists, harnessing the potential for…
Hardware acceleration of database query processing can be done with the help of FPGAs. In particular, they are partially reconfigurable during runtime, which allows for the runtime adaption of the hardware to a variety of queries.…
Stream processing is usually done either on a tuple-by-tuple basis or in micro-batches. There are many applications where tuples over a predefined duration/window must be processed within certain deadlines. Processing such queries using…
Long sequences of text are challenging in the context of transformers, due to quadratic memory increase in the self-attention mechanism. As this issue directly affects the translation from natural language to SQL queries (as techniques…
Trees can accelerate queries that search or aggregate values over large collections. They achieve this by storing metadata that enables quick pruning (or inclusion) of subtrees when predicates on that metadata can prove that none (or all)…
The ever-growing collections of data series create a pressing need for efficient similarity search, which serves as the backbone for various analytics pipelines. Recent studies have shown that tree-based series indexes excel in many…
Balancing performance and interpretability in multivariate time series classification is a significant challenge due to data complexity and high dimensionality. This paper introduces PHeatPruner, a method integrating persistent homology and…
Transfer learning can be seen as a data- and compute-efficient alternative to training models from scratch. The emergence of rich model repositories, such as TensorFlow Hub, enables practitioners and researchers to unleash the potential of…
Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by…
Modern data processing applications execute increasingly sophisticated analysis that requires operations beyond traditional relational algebra. As a result, operators in query plans grow in diversity and complexity. Designing query…