Related papers: Ternary and Binary Quantization for Improved Class…
Quantization is widely applied in machine learning to reduce computational and storage costs for both data and models. Considering that classification tasks are fundamental to the field, it is crucial to investigate how quantization impacts…
Inference time, model size, and accuracy are critical for deploying deep neural network models. Numerous research efforts have been made to compress neural network models with faster inference and higher accuracy. Pruning and quantization…
Random projection is often used to project higher-dimensional vectors onto a lower-dimensional space, while approximately preserving their pairwise distances. It has emerged as a powerful tool in various data processing tasks and has…
Neural network models are resource hungry. It is difficult to deploy such deep networks on devices with limited resources, like smart wearables, cellphones, drones, and autonomous vehicles. Low bit quantization such as binary and ternary…
Inference time, model size, and accuracy are three key factors in deep model compression. Most of the existing work addresses these three key factors separately as it is difficult to optimize them all at the same time. For example, low-bit…
Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing the model size by indexing model weights with full-precision embeddings, i.e., codewords, while the…
In this paper, we study randomized reduction methods, which reduce high-dimensional features into low-dimensional space by randomized methods (e.g., random projection, random hashing), for large-scale high-dimensional classification.…
Most of the existing works use projection functions for ternary quantization in discrete space. Scaling factors and thresholds are used in some cases to improve the model accuracy. However, the gradients used for optimization are inaccurate…
A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hard- ware implementation costs during training to achieve significant model compression for…
Traditionally, quantization is designed to minimize the reconstruction error of a data source. When considering downstream classification tasks, other measures of distortion can be of interest; such as the 0-1 classification loss.…
Random projections offer an appealing and flexible approach to a wide range of large-scale statistical problems. They are particularly useful in high-dimensional settings, where we have many covariates recorded for each observation. In…
With the explosive growth of image databases, deep hashing, which learns compact binary descriptors for images, has become critical for fast image retrieval. Many existing deep hashing methods leverage quantization loss, defined as distance…
An effective unsupervised hashing algorithm leads to compact binary codes preserving the neighborhood structure of data as much as possible. One of the most established schemes for unsupervised hashing is to reduce the dimensionality of…
Dimensionality reduction is an essential technique for multi-way large-scale data, i.e., tensor. Tensor ring (TR) decomposition has become popular due to its high representation ability and flexibility. However, the traditional TR…
The rapid development of large pre-trained language models has greatly increased the demand for model compression techniques, among which quantization is a popular solution. In this paper, we propose BinaryBERT, which pushes BERT…
Dimensionality reduction is an effective method for learning high-dimensional data, which can provide better understanding of decision boundaries in human-readable low-dimensional subspace. Linear methods, such as principal component…
Binary concepts are empirically used by humans to generalize efficiently. And they are based on Bernoulli distribution which is the building block of information. These concepts span both low-level and high-level features such as "large vs…
As a typical dimensionality reduction technique, random projection can be simply implemented with linear projection, while maintaining the pairwise distances of high-dimensional data with high probability. Considering this technique is…
This paper, broadly speaking, covers the use of randomness in two main areas: low-rank approximation and kernel methods. Low-rank approximation is very important in numerical linear algebra. Many applications depend on matrix decomposition…
Binary, or one-bit, representations of data arise naturally in many applications, and are appealing in both hardware implementations and algorithm design. In this work, we study the problem of data classification from binary data and…