English
Related papers

Related papers: Corra: Correlation-Aware Column Compression

200 papers

The growing adoption of data lakes for managing relational data necessitates efficient, open storage formats that provide high scan performance and competitive compression ratios. While existing formats achieve fast scans through…

Databases · Computer Science 2024-10-25 Mihail Stoian , Alexander van Renen , Jan Kobiolka , Ping-Lin Kuo , Josif Grabocka , Andreas Kipf

In-memory columnar databases have become mainstream over the last decade and have vastly improved the fast processing of large volumes of data through multi-core parallelism and in-memory compression thereby eliminating the usual…

Databases · Computer Science 2016-09-27 Jayanth Jayanth

Cache-aided coded multicast leverages side information at wireless edge caches to efficiently serve multiple groupcast demands via common multicast transmissions, leading to load reductions that are proportional to the aggregate cache size.…

Information Theory · Computer Science 2016-11-17 Parisa Hassanzadeh , Antonia Tulino , Jaime Llorca , Elza Erkip

Data warehouses organize data in a columnar format to enable faster scans and better compression. Modern systems offer a variety of column encodings that can reduce storage footprint and improve query performance. Selecting a good encoding…

Databases · Computer Science 2021-05-20 Lujing Cen , Andreas Kipf , Ryan Marcus , Tim Kraska

Data compression is widely used in contemporary column-oriented DBMSes to lower space usage and to speed up query processing. Pioneering systems have introduced compression to tackle the disk bandwidth bottleneck by trading CPU processing…

Databases · Computer Science 2021-05-20 Alexander Slesarev , Evgeniy Klyuchikov , Kirill Smirnov , George Chernishev

Modern data analytics applications prefer to use column-storage formats due to their improved storage efficiency through encoding and compression. Parquet is the most popular file format for column data storage that provides several of…

Databases · Computer Science 2022-12-14 Majid Saeedan , Ahmed Eldawy

Lightweight data compression is a key technique that allows column stores to exhibit superior performance for analytical queries. Despite a comprehensive study on dictionary-based encodings to approach Shannon's entropy, few prior works…

Databases · Computer Science 2023-11-27 Yihao Liu , Xinyu Zeng , Huanchen Zhang

Dataset Condensation (DC) aims to obtain a condensed dataset that allows models trained on the condensed dataset to achieve performance comparable to those trained on the full dataset. Recent DC approaches increasingly focus on encoding…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Bowen Yuan , Yuxia Fu , Zijian Wang , Yadan Luo , Zi Huang

Cache-aided coded multicast leverages side information at wireless edge caches to efficiently serve multiple unicast demands via common multicast transmissions, leading to load reductions that are proportional to the aggregate cache size.…

Information Theory · Computer Science 2018-06-14 Parisa Hassanzadeh , Antonia M. Tulino , Jaime Llorca , Elza Erkip

Coded caching and delivery is studied taking into account the correlations among the contents in the library. Correlations are modeled as common parts shared by multiple contents; that is, each file in the database is composed of a group of…

Information Theory · Computer Science 2017-11-13 Qianqian Yang , Deniz Gündüz

Motivated by applications of distributed storage systems to cloud-based key-value stores, the multi-version coding problem has been recently formulated to efficiently store frequently updated data in asynchronous decentralized storage…

Information Theory · Computer Science 2019-03-12 Ramy E. Ali , Viveck Cadambe

Distributed high dimensional mean estimation is a common aggregation routine used often in distributed optimization methods. Most of these applications call for a communication-constrained setting where vectors, whose mean is to be…

Machine Learning · Statistics 2026-01-28 Harsh Vardhan , Arya Mazumdar

Motivated by applications of distributed storage systems to key-value stores, the multi-version coding problem was formulated to efficiently store frequently updated data in asynchronous decentralized storage systems. Inspired by…

Information Theory · Computer Science 2019-03-14 Ramy E. Ali , Viveck R. Cadambe

Compressed Sparse Column (CSC) and Coordinate (COO) are popular compression formats for sparse matrices. However, both CSC and COO are general purpose and cannot take advantage of any of the properties of the data other than sparsity, such…

Data Structures and Algorithms · Computer Science 2025-07-01 Skyler Ruiter , Seth Wolfgang , Marc Tunnell , Timothy Triche , Erin Carrier , Zachary DeBruine

With endless amounts of data and very limited bandwidth, fast data compression is one solution for the growing datasharing problem. Compression helps lower transfer times and save memory, but if the compression takes too long, this no…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-21 David Noel , Elizabeth Graham , Liyuan Liu

This article proposes a novel iterative algorithm based on Low Density Parity Check (LDPC) codes for compression of correlated sources at rates approaching the Slepian-Wolf bound. The setup considered in the article looks at the problem of…

Information Theory · Computer Science 2007-10-31 F. Daneshgaran , Massimiliano Laddomada , M. Mondin

Layered decoding is well appreciated in Low-Density Parity-Check (LDPC) decoder implementation since it can achieve effectively high decoding throughput with low computation complexity. This work, for the first time, addresses low…

Information Theory · Computer Science 2012-04-13 Zhiqiang Cui , Zhongfeng Wang , Xinmiao Zhang

Cropping high-resolution document images into multiple sub-images is the most widely used approach for current Multimodal Large Language Models (MLLMs) to do document understanding. Most of current document understanding methods preserve…

Computer Vision and Pattern Recognition · Computer Science 2024-07-22 Renshan Zhang , Yibo Lyu , Rui Shao , Gongwei Chen , Weili Guan , Liqiang Nie

Traditional video compression technologies have been developed over decades in pursuit of higher coding efficiency. Efficient temporal information representation plays a key role in video coding. Thus, in this paper, we propose to exploit…

Image and Video Processing · Electrical Eng. & Systems 2019-12-16 Haojie Liu , Han shen , Lichao Huang , Ming Lu , Tong Chen , Zhan Ma

In order to accommodate the ever-growing data from various, possibly independent, sources and the dynamic nature of data usage rates in practical applications, modern cloud data storage systems are required to be scalable, flexible, and…

Information Theory · Computer Science 2020-09-22 Siyi Yang , Ahmed Hareedy , Robert Calderbank , Lara Dolecek
‹ Prev 1 2 3 10 Next ›