English
Related papers

Related papers: Lightweight Correlation-Aware Table Compression

200 papers

Column encoding schemes have witnessed a spark of interest with the rise of open storage formats (like Parquet) in data lakes in modern cloud deployments. This is not surprising -- as data volume increases, it becomes more and more…

Databases · Computer Science 2024-06-18 Hanwen Liu , Mihail Stoian , Alexander van Renen , Andreas Kipf

Cache-aided coded multicast leverages side information at wireless edge caches to efficiently serve multiple groupcast demands via common multicast transmissions, leading to load reductions that are proportional to the aggregate cache size.…

Information Theory · Computer Science 2016-11-17 Parisa Hassanzadeh , Antonia Tulino , Jaime Llorca , Elza Erkip

Cropping high-resolution document images into multiple sub-images is the most widely used approach for current Multimodal Large Language Models (MLLMs) to do document understanding. Most of current document understanding methods preserve…

Computer Vision and Pattern Recognition · Computer Science 2024-07-22 Renshan Zhang , Yibo Lyu , Rui Shao , Gongwei Chen , Weili Guan , Liqiang Nie

This paper addresses the problem of correlation estimation in sets of compressed images. We consider a framework where images are represented under the form of linear measurements due to low complexity sensing or security requirements. We…

Computer Vision and Pattern Recognition · Computer Science 2011-12-20 Vijayaraghavan Thirumalai , Pascal Frossard

The communication bottleneck in federated learning (FL) has spurred extensive research into techniques to reduce the volume of data exchanged between client devices and the central parameter server. In this paper, we systematically classify…

Information Theory · Computer Science 2026-04-17 Adrian Edin , Michel Kieffer , Mikael Johansson , Zheng Chen

Modern data analytics applications prefer to use column-storage formats due to their improved storage efficiency through encoding and compression. Parquet is the most popular file format for column data storage that provides several of…

Databases · Computer Science 2022-12-14 Majid Saeedan , Ahmed Eldawy

We present a filter correlation based model compression approach for deep convolutional neural networks. Our approach iteratively identifies pairs of filters with the largest pairwise correlations and drops one of the filters from each such…

Computer Vision and Pattern Recognition · Computer Science 2020-01-17 Pravendra Singh , Vinay Kumar Verma , Piyush Rai , Vinay P. Namboodiri

The construction of highly coherent x-ray sources has enabled new research opportunities across the scientific landscape. The maximum raw data rate per beamline now exceeds 40 GB/s, posing unprecedented challenges for the online processing…

Compressing the KV cache is a required step to deploy large language models on edge devices. Current quantization methods compress storage but fail to reduce bandwidth as attention calculation requires dequantizing keys from INT4/INT8 to…

Machine Learning · Computer Science 2026-01-16 Aryan Karmore

Soft context compression reduces the computational workload of processing long contexts in LLMs by encoding long context into a smaller number of latent tokens. However, existing frameworks apply uniform compression ratios, failing to…

Computation and Language · Computer Science 2026-03-30 Yijiong Yu , Shuai Yuan , Jie Zheng , Huazheng Wang , Ji Pei

Driven by significant improvements in architectural design and training pipelines, computer vision has recently experienced dramatic progress in terms of accuracy on classic benchmarks such as ImageNet. These highly-accurate models are…

Computer Vision and Pattern Recognition · Computer Science 2023-06-01 Denis Kuznedelev , Eldar Kurtic , Elias Frantar , Dan Alistarh

Time series data from a variety of sensors and IoT devices need effective compression to reduce storage and I/O bandwidth requirements. While most time series databases and systems rely on lossless compression, lossy techniques offer even…

Databases · Computer Science 2025-01-27 Carlos Enrique Muñiz-Cuza , Matthias Boehm , Torben Bach Pedersen

The deployment of modern network applications is increasing the network size and traffic volumes at an unprecedented pace. Storing network-related information (e.g., traffic traces) is key to enable efficient network management. However,…

Networking and Internet Architecture · Computer Science 2023-01-24 Paul Almasan , Krzysztof Rusek , Shihan Xiao , Xiang Shi , Xiangle Cheng , Albert Cabellos-Aparicio , Pere Barlet-Ros

Data lakes, increasingly adopted for their ability to store and analyze diverse types of data, commonly use columnar storage formats like Parquet and ORC for handling relational tables. However, these traditional setups fall short when it…

Databases · Computer Science 2024-09-26 Xue Li , Weibin Zeng , Zhibin Wang , Diwen Zhu , Jingbo Xu , Wenyuan Yu , Jingren Zhou

Nowadays, data is represented by vectors. Retrieving those vectors, among millions and billions, that are similar to a given query is a ubiquitous problem, known as similarity search, of relevance for a wide range of applications.…

Machine Learning · Computer Science 2023-07-26 Cecilia Aguerrebere , Ishwar Bhati , Mark Hildebrand , Mariano Tepper , Ted Willke

Lightweight data compression is a key technique that allows column stores to exhibit superior performance for analytical queries. Despite a comprehensive study on dictionary-based encodings to approach Shannon's entropy, few prior works…

Databases · Computer Science 2023-11-27 Yihao Liu , Xinyu Zeng , Huanchen Zhang

A method is presented to automatically generate context models of data by calculating the data's autocorrelation function. The largest values of the autocorrelation function occur at the offsets or lags in the bitstream which tend to be the…

Information Theory · Computer Science 2013-06-11 John Scoville

The proliferation of small files in data lakes poses significant challenges, including degraded query performance, increased storage costs, and scalability bottlenecks in distributed storage systems. Log-structured table formats (LSTs) such…

Increasing amounts of structured data can provide value for research and business if the relevant data can be located. Often the data is in a data lake without a consistent schema, making locating useful data challenging. Table search is a…

Databases · Computer Science 2023-08-29 Michael Glass , Sugato Bagchi , Oktie Hassanzadeh , Gaetano Rossiello , Alfio Gliozzo

Cache-aided coded multicast leverages side information at wireless edge caches to efficiently serve multiple unicast demands via common multicast transmissions, leading to load reductions that are proportional to the aggregate cache size.…

Information Theory · Computer Science 2018-06-14 Parisa Hassanzadeh , Antonia M. Tulino , Jaime Llorca , Elza Erkip
‹ Prev 1 2 3 10 Next ›