Related papers: More Than Bits: Multi-Envelope Double Binary Facto…
Binary quantization approaches, which replace weight matrices with binary matrices and substitute costly multiplications with cheaper additions, offer a computationally efficient approach to address the increasing computational and storage…
Boolean matrix has been used to represent digital information in many fields, including bank transaction, crime records, natural language processing, protein-protein interaction, etc. Boolean matrix factorization (BMF) aims to find an…
Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment. Quantization emerges as one of the most effective…
User and item features of side information are crucial for accurate recommendation. However, the large number of feature dimensions, e.g., usually larger than 10^7, results in expensive storage and computational cost. This prohibits fast…
Factorization Machines (FM), a general predictor that can efficiently model feature interactions in linear time, was primarily proposed for collaborative recommendation and have been broadly used for regression, classification and ranking…
Multiresolution Matrix Factorization (MMF) was recently introduced as an alternative to the dominant low-rank paradigm in order to capture structure in matrices at multiple different scales. Using ideas from multiresolution analysis (MRA),…
The medium-density parity-check (MDPC) code-based Bit Flipping Key Encapsulation (BIKE) mechanism remains a candidate of post-quantum cryptography standardization. The latest version utilizes a new bit-flipping (BF) decoding algorithm,…
The deployment of large language models (LLMs) is frequently hindered by prohibitive memory and computational requirements. While quantization mitigates these bottlenecks, maintaining model fidelity in the sub-1-bit regime remains a…
Due to the speed limitation of the conventional bit-chosen strategy in the existing weighted bit flipping algorithms, a high-speed LDPC decoder cannot be realized. To solve this problem, we propose a fast weighted bit flipping (FWBF)…
Deploying large language models (LLMs) in resource-constrained environments is hindered by heavy computational and memory requirements. We present LBLLM, a lightweight binarization framework that achieves effective W(1+1)A4 quantization…
Multiresolution Matrix Factorization (MMF) is unusual amongst fast matrix factorization algorithms in that it does not make a low rank assumption. This makes MMF especially well suited to modeling certain types of graphs with complex…
Binary matrix factorisation is an essential tool for identifying discrete patterns in binary data. In this paper we consider the rank-k binary matrix factorisation problem (k-BMF) under Boolean arithmetic: we are given an n x m binary…
Quantum low-density parity-check (QLDPC) codes have been proven to achieve higher minimum distances at higher code rates than surface codes. However, this family of codes imposes stringent latency requirements and poor performance under…
With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation…
Large Language Models (LLMs) deliver strong performance but are difficult to deploy under tight memory and compute constraints. Low-bit post-training quantization (PTQ) is a promising direction; however, it typically relies on calibration…
Hybrid beamforming (HBF) design is a crucial stage in millimeter wave (mmWave) multi-user multi-input multi-output (MU-MIMO) systems. However, conventional HBF methods are still with high complexity and strongly rely on the quality of…
Large language models (LLMs) have become pivotal in artificial intelligence, demonstrating strong capabilities in reasoning, understanding, and generating data. However, their deployment on edge devices is hindered by their substantial…
In this paper, we propose a new class of bit flipping algorithms for low-density parity-check (LDPC) codes over the binary symmetric channel (BSC). Compared to the regular (parallel or serial) bit flipping algorithms, the proposed…
Reduced-precision data formats are crucial for cost-effective serving of large language models (LLMs). While numerous reduced-precision formats have been introduced thus far, they often require intrusive modifications to the software…
Mixed-precision quantization is a promising approach for compressing large language models under tight memory budgets. However, existing mixed-precision methods typically suffer from one of two limitations: they either rely on expensive…