Related papers: Memoryless scalar quantization for random frames
As deep neural networks (DNNs) see increased deployment on mobile and edge devices, optimizing model efficiency has become crucial. Mixed-precision quantization is widely favored, as it offers a superior balance between efficiency and…
Model quantization can reduce the model size and computational latency, it has become an essential technique for the deployment of deep neural networks on resourceconstrained hardware (e.g., mobile phones and embedded devices). The existing…
Quantization is a promising approach for reducing memory overhead and accelerating inference, especially in large pre-trained language model (PLM) scenarios. While having no access to original training data due to security and privacy…
Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on…
Quantitative susceptibility mapping (QSM) utilizes MRI signal phase to infer estimates of local tissue magnetism (magnetic susceptibility), which has been shown useful to provide novel image contrast and as biomarkers of abnormal tissue.…
Quantization is a widely used compression method that effectively reduces redundancies in over-parameterized neural networks. However, existing quantization techniques for deep neural networks often lack a comprehensive error analysis due…
Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency,…
In the noisy intermediate-scale quantum (NISQ) era, quantum error mitigation (QEM) is essential for producing reliable outputs from quantum circuits. We present a statistical signal processing approach to QEM that estimates the most likely…
Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency,…
Frame permutation quantization (FPQ) is a new vector quantization technique using finite frames. In FPQ, a vector is encoded using a permutation source code to quantize its frame expansion. This means that the encoding is a partial ordering…
Model quantization has become essential for efficient large language model deployment, yet existing approaches involve clear trade-offs: methods such as GPTQ and AWQ achieve practical compression but are lossy, while lossless techniques…
Despite significant advancements in human motion generation, current motion representations, typically formulated as discrete frame sequences, still face two critical limitations: (i) they fail to capture motion from a multi-scale…
Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resource-constrained devices. However, it is still hard for extremely low-bit…
Influence of the finite-length registers and quantization effects on the reconstruction of sparse and approximately sparse signals is analyzed in this paper. For the nonquantized measurements, the compressive sensing (CS) framework provides…
In this paper the method of simulated quantiles (MSQ) of Dominicy and Veredas (2013) and Dominick et al. (2013) is extended to a general multivariate framework (MMSQ) and to provide a sparse estimator of the scale matrix (sparse-MMSQ). The…
Quantum noise fundamentally limits the utility of near-term quantum devices, making error mitigation essential for practical quantum computation. While traditional quantum error correction codes require substantial qubit overhead and…
Quantization approximates a deep network model with floating-point numbers by the one with low bit width numbers, in order to accelerate inference and reduce computation. Quantizing a model without access to the original data, zero-shot…
We provide the first analysis of a non-trivial quantization scheme for compressed sensing measurements arising from structured measurements. Specifically, our analysis studies compressed sensing matrices consisting of rows selected at…
Mixed Precision Quantization (MPQ) has become an essential technique for optimizing neural network by determining the optimal bitwidth per layer. Existing MPQ methods, however, face a major hurdle: they require a computationally expensive…
Based on the model's resilience to computational noise, model quantization is important for compressing models and improving computing speed. Existing quantization techniques rely heavily on experience and "fine-tuning" skills. In the…