Related papers: Query Complexity Based Optimal Processing of Raw D…

Workload-Aware Incremental Reclustering in Cloud Data Warehouses

Modern cloud data warehouses store data in micro-partitions and rely on metadata (e.g., zonemaps) for efficient data pruning during query processing. Maintaining data clustering in a large-scale table is crucial for effective data pruning.…

Databases · Computer Science 2026-03-18 Yipeng Liu , Renfei Zhou , Jiaqi Yan , Huanchen Zhang

Resource Utilization Monitoring for Raw Data Query Processing

Scientific experiments, simulations, and modern applications generate large amounts of data. Data is stored in raw format to avoid the high loading time of traditional database management systems. Researchers have proposed many techniques…

Databases · Computer Science 2022-12-22 Mayank Patel , Minal Bhise

Workload-Driven Vertical Partitioning for Effective Query Processing over Raw Data

Traditional databases are not equipped with the adequate functionality to handle the volume and variety of "Big Data". Strict schema definition and data loading are prerequisites even for the most primitive query session. Raw data…

Databases · Computer Science 2015-05-12 Weijie Zhao , Yu Cheng , Florin Rusu

A Hybrid Heuristic Framework for Resource-Efficient Querying of Scientific Experiments Data

Scientific experiments and modern applications are generating large amounts of data every day. Most organizations utilize In-house servers or Cloud resources to manage application data and workload. The traditional database management…

Databases · Computer Science 2025-06-17 Mayank Patel , Minal Bhise

Pruning in Snowflake: Working Smarter, Not Harder

Modern cloud-based data analytics systems must efficiently process petabytes of data residing on cloud storage. A key optimization technique in state-of-the-art systems like Snowflake is partition pruning - skipping chunks of data that do…

Databases · Computer Science 2025-06-23 Andreas Zimmerer , Damien Dam , Jan Kossmann , Juliane Waack , Ismail Oukid , Andreas Kipf

WawPart: Workload-Aware Partitioning of Knowledge Graphs

Large-scale datasets in the form of knowledge graphs are often used in numerous domains, today. A knowledge graphs size often exceeds the capacity of a single computer system, especially if the graph must be stored in main memory. To…

Databases · Computer Science 2022-03-29 Amitabh Priyadarshi , Krzysztof J. Kochut

Approximate Partition Selection for Big-Data Workloads using Summary Statistics

Many big-data clusters store data in large partitions that support access at a coarse, partition-level granularity. As a result, approximate query processing via row-level sampling is inefficient, often requiring reads of many partitions.…

Databases · Computer Science 2020-08-25 Kexin Rong , Yao Lu , Peter Bailis , Srikanth Kandula , Philip Levis

Explaining with Greater Support: Weighted Column Sampling Optimization for q-Consistent Summary-Explanations

Machine learning systems have been extensively used as auxiliary tools in domains that require critical decision-making, such as healthcare and criminal justice. The explainability of decisions is crucial for users to develop trust on these…

Artificial Intelligence · Computer Science 2023-02-10 Chen Peng , Zhengqi Dai , Guangping Xia , Yajie Niu , Yihui Lei

Cache-based Multi-query Optimization for Data-intensive Scalable Computing Frameworks

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in…

Databases · Computer Science 2018-05-23 Pietro Michiardi , Damiano Carra , Sara Migliorini

Dataset Quantization with Active Learning based Adaptive Sampling

Deep learning has made remarkable progress recently, largely due to the availability of large, well-labeled datasets. However, the training on such datasets elevates costs and computational demands. To address this, various techniques like…

Computer Vision and Pattern Recognition · Computer Science 2024-07-11 Zhenghao Zhao , Yuzhang Shang , Junyi Wu , Yan Yan

Facilitating SQL Query Composition and Analysis

Formulating efficient SQL queries requires several cycles of tuning and execution, particularly for inexperienced users. We examine methods that can accelerate and improve this interaction by providing insights about SQL queries prior to…

Databases · Computer Science 2020-02-24 Zainab Zolaktaf , Mostafa Milani , Rachel Pottinger

Partition Tree Weighting

This paper introduces the Partition Tree Weighting technique, an efficient meta-algorithm for piecewise stationary sources. The technique works by performing Bayesian model averaging over a large class of possible partitions of the data…

Information Theory · Computer Science 2012-11-22 Joel Veness , Martha White , Michael Bowling , András György

Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings

Query processing over big data is ubiquitous in modern clouds, where the system takes care of picking both the physical query execution plans and the resources needed to run those plans, using a cost-based query optimizer. A good cost…

Databases · Computer Science 2020-03-02 Tarique Siddiqui , Alekh Jindal , Shi Qiao , Hiren Patel , Wangchao le

Answering Complex Logical Queries on Knowledge Graphs via Query Computation Tree Optimization

Answering complex logical queries on incomplete knowledge graphs is a challenging task, and has been widely studied. Embedding-based methods require training on complex queries, and cannot generalize well to out-of-distribution query…

Machine Learning · Computer Science 2023-06-08 Yushi Bai , Xin Lv , Juanzi Li , Lei Hou

EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems

The emerging citation-based QA systems are gaining more attention especially in generative AI search applications. The importance of extracted knowledge provided to these systems is vital from both accuracy (completeness of information) and…

Computation and Language · Computer Science 2024-06-18 Mohammad Dehghan , Mohammad Ali Alomrani , Sunyam Bagga , David Alfonso-Hermelo , Khalil Bibi , Abbas Ghaddar , Yingxue Zhang , Xiaoguang Li , Jianye Hao , Qun Liu , Jimmy Lin , Boxing Chen , Prasanna Parthasarathi , Mahdi Biparva , Mehdi Rezagholizadeh

Beyond NED: Fast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases

Answering complex questions over knowledge bases (KB-QA) faces huge input data with billions of facts, involving millions of entities and thousands of predicates. For efficiency, QA systems first reduce the answer search space by…

Information Retrieval · Computer Science 2022-04-05 Philipp Christmann , Rishiraj Saha Roy , Gerhard Weikum

Rethinking Large-scale Dataset Compression: Shifting Focus From Labels to Images

Dataset distillation and dataset pruning are two prominent techniques for compressing datasets to improve computational and storage efficiency. Despite their overlapping objectives, these approaches are rarely compared directly. Even within…

Computer Vision and Pattern Recognition · Computer Science 2025-02-11 Lingao Xiao , Songhua Liu , Yang He , Xinchao Wang

Load Balanced Semantic Aware Distributed RDF Graph

The modern day semantic applications store data as Resource Description Framework (RDF) data.Due to Proliferation of RDF Data, the efficient management of huge RDF data has become essential. A number of approaches pertaining to both…

Databases · Computer Science 2021-07-23 Ami Pandat , Nidhi Gupta , Minal Bhise

Quantization without Tears

Deep neural networks, while achieving remarkable success across diverse tasks, demand significant resources, including computation, GPU memory, bandwidth, storage, and energy. Network quantization, as a standard compression and acceleration…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Minghao Fu , Hao Yu , Jie Shao , Junjie Zhou , Ke Zhu , Jianxin Wu

Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long…

Machine Learning · Computer Science 2024-08-21 Xijie Huang , Zechun Liu , Shih-Yang Liu , Kwang-Ting Cheng