English
Related papers

Related papers: QStore: Quantization-Aware Compressed Model Storag…

200 papers

With the prevalence of in-database AI-powered analytics, there is an increasing demand for database systems to efficiently manage the ever-expanding number and size of deep learning models. However, existing database systems typically store…

Databases · Computer Science 2025-09-16 Siqi Xiang , Sheng Wang , Xiaokui Xiao , Cong Yue , Zhanhao Zhao , Beng Chin Ooi

Federated Learning (FL) is an approach for privacy-preserving Machine Learning (ML), enabling model training across multiple clients without centralized data collection. With an aggregator server coordinating training, aggregating model…

Machine Learning · Computer Science 2025-03-04 Ahmad Faraz Khan , Samuel Fountain , Ahmed M. Abdelmoniem , Ali R. Butt , Ali Anwar

In this paper, we present MorphStore, an open-source in-memory columnar analytical query engine with a novel holistic compression-enabled processing model. Basically, compression using lightweight integer compression algorithms already…

Quantization is a powerful tool to improve large language model (LLM) inference efficiency by utilizing more energy-efficient low-precision datapaths and reducing memory footprint. However, accurately quantizing LLM weights and activations…

Hardware Architecture · Computer Science 2025-04-22 Coleman Hooper , Charbel Sakr , Ben Keller , Rangharajan Venkatesan , Kurt Keutzer , Sophia Shao , Brucek Khailany

We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We leverage weight normalization as a means of constraining parameters during…

Machine Learning · Computer Science 2023-02-01 Ian Colbert , Alessandro Pappalardo , Jakoba Petri-Koenig

Deep learning-based face recognition models follow the common trend in deep neural networks by utilizing full-precision floating-point networks with high computational costs. Deploying such networks in use-cases constrained by computational…

Computer Vision and Pattern Recognition · Computer Science 2022-06-22 Fadi Boutros , Naser Damer , Arjan Kuijper

We are witnessing an increasing availability of streaming data that may contain valuable information on the underlying processes. It is thus attractive to be able to deploy machine learning models on edge devices near sensors such that…

Machine Learning · Computer Science 2024-10-22 David Campos , Bin Yang , Tung Kieu , Miao Zhang , Chenjuan Guo , Christian S. Jensen

Large Language Models (LLMs) achieve strong performance across tasks, but face storage and compute challenges on edge devices. We propose EntroLLM, a compression framework combining mixed quantization and entropy coding to reduce storage…

Machine Learning · Computer Science 2026-05-05 Arnab Sanyal , Gourav Datta , Prithwish Mukherjee , Sandeep P. Chinchali , Michael Orshansky

As machine learning inferences increasingly move to edge devices, adapting to diverse computational capabilities, hardware, and memory constraints becomes more critical. Instead of relying on a pre-trained model fixed for all future…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-01 Xiangchen Li , Saeid Ghafouri , Bo Ji , Hans Vandierendonck , Deepu John , Dimitrios S. Nikolopoulos

Quantization-aware training (QAT) is typically performed for a single target numeric format, while practical deployments often need to choose numerical precision at inference time based on hardware support or runtime constraints. We study…

Machine Learning · Computer Science 2026-04-02 Zifei Xu , Sayeh Sharify , Hesham Mostafa

Lossy image compression is essential for efficient transmission and storage. Traditional compression methods mainly rely on discrete cosine transform (DCT) or singular value decomposition (SVD), both of which represent image data in…

Image and Video Processing · Electrical Eng. & Systems 2025-03-28 Pooya Ashtari , Pourya Behmandpoor , Fateme Nateghi Haredasht , Jonathan H. Chen , Panagiotis Patrinos , Sabine Van Huffel

Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks. Fine-tuning these pretrained models on downstream datasets provides further significant performance gains; however,…

Computation and Language · Computer Science 2026-03-19 Zhikai Li , Xiaoxuan Liu , Banghua Zhu , Zhen Dong , Qingyi Gu , Kurt Keutzer

With the growth of model sizes and scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast literature about reducing model sizes, we…

Catastrophic forgetting poses a fundamental challenge in continual learning, particularly when models are quantized for deployment efficiency. We systematically investigate the interplay between quantization precision (FP16, INT8, INT4) and…

Machine Learning · Computer Science 2025-12-23 Michael S. Zhang , Rishi A. Ruia , Arnav Kewalram , Saathvik Dharmapuram , Utkarsh Sharma , Kevin Zhu

The rise of large language models (LLMs) has significantly advanced various natural language processing (NLP) tasks. However, the resource demands of these models pose substantial challenges. Structured pruning is an effective approach to…

Machine Learning · Computer Science 2024-12-17 Changhai Zhou , Yuhua Zhou , Shijie Han , Qian Qiao , Hongguang Li

Modern applications span multiple clouds to reduce costs, avoid vendor lock-in, and leverage low-availability resources in another cloud. However, standard object stores operate within a single cloud, forcing users to manually manage data…

Modern model hubs, such as Hugging Face, store tens of petabytes of LLMs, with fine-tuned variants vastly outnumbering base models and dominating storage consumption. Existing storage reduction techniques -- such as deduplication and…

Databases · Computer Science 2025-11-11 Zirui Wang , Tingfeng Lan , Zhaoyuan Su , Juncheng Yang , Yue Cheng

Large language models (LLMs) have significantly advanced the natural language processing paradigm but impose substantial demands on memory and computational resources. Quantization is one of the most effective ways to reduce memory…

Machine Learning · Computer Science 2025-04-29 Xilong Xie , Liang Wang , Limin Xiao , Meng Han , Lin Sun , Shuai Zheng , Xiangrong Xu

Language Models (LLMs) are often quantized to lower precision to reduce the memory cost and latency in inference. However, quantization often degrades model performance, thus fine-tuning is required for various down-stream tasks.…

Machine Learning · Computer Science 2025-02-19 Jiajun Zhou , Yifan Yang , Kai Zhen , Ziyue Liu , Yequan Zhao , Ershad Banijamali , Athanasios Mouchtaris , Ngai Wong , Zheng Zhang

Diffusion models have been achieving remarkable performance in face restoration. However, the heavy computations hamper the widespread adoption of these models. In this work, we propose QuantFace, a novel low-bit quantization framework for…

Computer Vision and Pattern Recognition · Computer Science 2025-11-24 Jiatong Li , Libo Zhu , Haotong Qin , Jingkai Wang , Linghe Kong , Guihai Chen , Yulun Zhang , Xiaokang Yang
‹ Prev 1 2 3 10 Next ›