Related papers: Qd-tree: Learning Data Layouts for Big Data Analyt…

In-memory Multidimensional Indexing Using the skd-tree

In this paper, we revisit the problem of indexing multi-dimensional data in memory for the efficient support of multi-dimensional range queries and nearest neighbor queries. This is a classic problem in main-memory databases, where there is…

Databases · Computer Science 2026-05-06 Achilleas Michalopoulos , Dimitrios Tsitsigkos , Nikos Mamoulis

Q-Learning-Based Time-Critical Data Aggregation Scheduling in IoT

Time-critical data aggregation in Internet of Things (IoT) networks demands efficient, collision-free scheduling to minimize latency for applications like smart cities and industrial automation. Traditional heuristic methods, with two-phase…

Networking and Internet Architecture · Computer Science 2025-11-25 Van-Vi Vo , Tien-Dung Nguyen , Duc-Tai Le , Hyunseung Choo

FITing-Tree: A Data-aware Index Structure

Index structures are one of the most important tools that DBAs leverage to improve the performance of analytics and transactional workloads. However, building several indexes over large datasets can often become prohibitive and consume…

Databases · Computer Science 2020-03-26 Alex Galakatos , Michael Markovitch , Carsten Binnig , Rodrigo Fonseca , Tim Kraska

Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees with Continuous Features

Decision trees are one of the most useful and popular methods in the machine learning toolbox. In this paper, we consider the problem of learning optimal decision trees, a combinatorial optimization problem that is challenging to solve at…

Machine Learning · Computer Science 2022-07-01 Rahul Mazumder , Xiang Meng , Haoyue Wang

An Efficient Method of Partitioning High Volumes of Multidimensional Data for Parallel Clustering Algorithms

An optimal data partitioning in parallel & distributed implementation of clustering algorithms is a necessary computation as it ensures independent task completion, fair distribution, less number of affected points and better & faster…

Artificial Intelligence · Computer Science 2016-09-21 Saraswati Mishra , Avnish Chandra Suman

QUEST: An Efficient Query Evaluation Scheme Towards Scan-Intensive Cross-Model Analysis

Modern data-driven applications require that databases support fast cross-model analytical queries. Achieving fast analytical queries in a database system is challenging since they are usually scan-intensive (i.e., they need to intensively…

Databases · Computer Science 2023-09-22 Jianfeng Huang , Dongjing Miao , Xin Liu

Efficient Tree Layout in a Multilevel Memory Hierarchy

We consider the problem of laying out a tree with fixed parent/child structure in hierarchical memory. The goal is to minimize the expected number of block transfers performed during a search along a root-to-leaf path, subject to a given…

Data Structures and Algorithms · Computer Science 2007-05-23 Stephen Alstrup , Michael A. Bender , Erik D. Demaine , Martin Farach-Colton , Theis Rauhe , Mikkel Thorup

Block-distributed Gradient Boosted Trees

The Gradient Boosted Tree (GBT) algorithm is one of the most popular machine learning algorithms used in production, for tasks that include Click-Through Rate (CTR) prediction and learning-to-rank. To deal with the massive datasets…

Machine Learning · Computer Science 2019-05-30 Theodore Vasiloudis , Hyunsu Cho , Henrik Boström

Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of…

Artificial Intelligence · Computer Science 2009-09-25 A. Moore , M. S. Lee

Quality Diversity for Robot Learning: Limitations and Future Directions

Quality Diversity (QD) has shown great success in discovering high-performing, diverse policies for robot skill learning. While current benchmarks have led to the development of powerful QD methods, we argue that new paradigms must be…

Robotics · Computer Science 2024-07-26 Sumeet Batra , Bryon Tjanaka , Stefanos Nikolaidis , Gaurav Sukhatme

Quantum Distributed Deep Learning Architectures: Models, Discussions, and Applications

Although deep learning (DL) has already become a state-of-the-art technology for various data processing tasks, data security and computational overload problems often arise due to their high data and computational power dependency. To…

Quantum Physics · Physics 2022-04-08 Yunseok Kwak , Won Joon Yun , Jae Pyoung Kim , Hyunhee Cho , Minseok Choi , Soyi Jung , Joongheon Kim

An Optimized Data Structure for High Throughput 3D Proteomics Data: mzRTree

As an emerging field, MS-based proteomics still requires software tools for efficiently storing and accessing experimental data. In this work, we focus on the management of LC-MS data, which are typically made available in standard…

Computational Engineering, Finance, and Science · Computer Science 2010-04-27 Sara Nasso , Francesco Silvestri , Francesco Tisiot , Barbara Di Camillo , Andrea Pietracaprina , Gianna Maria Toffolo

A Query-Driven Approach to Space-Efficient Range Searching

We initiate a study of a query-driven approach to designing partition trees for range-searching problems. Our model assumes that a data structure is to be built for an unknown query distribution that we can access through a sampling oracle,…

Data Structures and Algorithms · Computer Science 2025-02-20 Dimitris Fotakis , Andreas Kalavas , Ioannis Psarros

Optimizing Data Lakes' Queries

Cloud data lakes provide a modern solution for managing large volumes of data. The fundamental principle behind these systems is the separation of compute and storage layers. In this architecture, inexpensive cloud storage is utilized for…

Databases · Computer Science 2025-10-20 Gregory , Weintraub

Faster Relational Algorithms Using Geometric Data Structures

Optimization tasks over relational data, such as clustering, often suffer from the prohibitive cost of join operations, which are necessary to access the full dataset. While geometric data structures like BBD trees yield fast approximation…

Databases · Computer Science 2026-03-13 Aryan Esmailpour , Stavros Sintos

F-tree: an algorithm for clustering transactional data using frequency tree

Clustering is an important data mining technique that groups similar data records, recently categorical transaction clustering is received more attention. In this research, we study the problem of categorical data clustering for…

Databases · Computer Science 2017-05-03 Mahmoud Mahdi , Samir Abdelrahman , Reem Bahgat , Ismail Ismail

SQUID: Faster Analytics via Sampled Quantile Estimation

Streaming algorithms are fundamental in the analysis of large and online datasets. A key component of many such analytic tasks is $q$-MAX, which finds the largest $q$ values in a number stream. Modern approaches attain a constant runtime by…

Data Structures and Algorithms · Computer Science 2024-07-11 Ran Ben-Basat , Gil Einziger , Wenchen Han , Bilal Tayh

QDR-Tree: An Efficient Index Scheme for Complex Spatial Keyword Query

With the popularity of mobile devices and the development of geo-positioning technology, location-based services (LBS) attract much attention and top-k spatial keyword queries become increasingly complex. It is common to see that clients…

Data Structures and Algorithms · Computer Science 2022-07-26 Xinshi Zang , Peiwen Hao , Xiaofeng Gao , Bin Yao , Guihai Chen

D-SPACE4Cloud: A Design Tool for Big Data Applications

The last years have seen a steep rise in data generation worldwide, with the development and widespread adoption of several software projects targeting the Big Data paradigm. Many companies currently engage in Big Data analytics as part of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-05-25 Michele Ciavotta , Eugenio Gianniti , Danilo Ardagna

Learning Multi-dimensional Indexes

Scanning and filtering over multi-dimensional tables are key operations in modern analytical database engines. To optimize the performance of these operations, databases often create clustered indexes over a single dimension or…

Databases · Computer Science 2020-06-25 Vikram Nathan , Jialin Ding , Mohammad Alizadeh , Tim Kraska